Genetic AlgorithmsHyperparameter TuningOptimizationscikit-learnCross-ValidationPython

Genetic Algorithm Hyperparameter Tuning Framework

Reusable genetic algorithm framework for hyperparameter tuning across scikit-learn models using tournament selection, crossover/mutation, elitism, and constraint-aware handling of invalid configurations.

Overview

A general-purpose genetic algorithm (GA) hyperparameter tuning framework implemented in Python. Evolves a population of candidate hyperparameter dictionaries over generations using tournament selection, uniform crossover, mutation, and elitism. Fitness is computed via 5-fold cross-validation with configurable scoring and parallelism (n_jobs). Includes a constraint-aware subclass for Logistic Regression to enforce compatible solver/penalty combinations.

Explore GA as a practical alternative when grid search is too expensive, while keeping the tuner reusable across model families and robust to incompatible hyperparameter dependencies.

Your Role

What I Built

GA engine: initialization, fitness scoring, tournament selection, crossover, mutation, elitism, and history tracking
Model-agnostic search spaces defined per estimator (tree / linear / kernel / ensemble)
Fitness evaluation via cross_val_score with configurable CV folds and scoring metric
Constraint-aware tuning for Logistic Regression (penalty→solver compatibility enforced during init/crossover/mutation)
Experiment harness to compare GA vs GridSearchCV vs RandomizedSearchCV under similar evaluation budgets

What I Owned End-to-End

Core GA implementation + reusable API design (scikit-learn style class wrapper)
Constraint-handling strategy for dependent hyperparameters (Logistic Regression)
Fair-budget comparison methodology (iterations aligned across tuners)
Results synthesis: accuracy/runtime impact and failure cases (where grid search became infeasible)

Technical Highlights

Architecture Decisions

Reusable estimator-style GA tuner (parameter-space in, best_params_/best_estimator_ out)
Fitness computed with 5-fold CV; optional parallelism via n_jobs
History tracking per generation (best/avg fitness + best params) for analysis and debugging

Algorithms / Protocols / Constraints

Tournament selection for parent choice
Uniform crossover (per-parameter mixing)
Mutation by resampling from the original search space
Elitism to preserve the best individual across generations
Constraint-aware GA variant for Logistic Regression to avoid invalid solver/penalty combos

Optimization Strategies

Comparable-budget evaluation (random search iterations aligned with pop_size × generations)
Parallel-capable fitness scoring (n_jobs) to reduce wall-clock time on multi-core machines

Tech Stack

Pythonscikit-learnNumPypandasJupyter

Results / Learnings

What Worked

Decision Tree: GA reached 0.9569 accuracy in 9.39s vs Grid Search 0.9692 in 1585.29s
SVM: GA reached 0.9789 accuracy in 1.78s vs Grid Search 0.9824 in 111.00s
Logistic Regression: GA matched Grid Search at 0.9780 accuracy (9.94s vs 212.77s)
Random Forest: GA achieved 0.9708 accuracy (grid search not completed after >2 days)

What I Learned

GA can be a strong tradeoff when grid search is prohibitively slow but you still want guided exploration
Small accuracy differences often cost orders of magnitude more runtime; the right tuner depends on constraints
Constraint handling is essential for reusable tuners (dependent hyperparameters otherwise waste most evaluations)

Tradeoffs Considered

GA runtime can exceed randomized search for similar accuracy, depending on population/generation settings
GA introduces its own hyperparameters (population size, mutation rate, crossover probability) that influence outcomes
Some model spaces remain expensive to evaluate because the fitness function is CV itself

← View All Projects

← Occluded Pet Detection in Domestic Environments AI Scheduling for City Soccer Field Allocation →