← Back to Projects
Genetic AlgorithmsHyperparameter TuningOptimizationscikit-learnCross-ValidationPython

Genetic Algorithm Hyperparameter Tuning Framework

Reusable genetic algorithm framework for hyperparameter tuning across scikit-learn models using tournament selection, crossover/mutation, elitism, and constraint-aware handling of invalid configurations.

Overview

A general-purpose genetic algorithm (GA) hyperparameter tuning framework implemented in Python. Evolves a population of candidate hyperparameter dictionaries over generations using tournament selection, uniform crossover, mutation, and elitism. Fitness is computed via 5-fold cross-validation with configurable scoring and parallelism (n_jobs). Includes a constraint-aware subclass for Logistic Regression to enforce compatible solver/penalty combinations.

Explore GA as a practical alternative when grid search is too expensive, while keeping the tuner reusable across model families and robust to incompatible hyperparameter dependencies.

Your Role

What I Built

  • GA engine: initialization, fitness scoring, tournament selection, crossover, mutation, elitism, and history tracking
  • Model-agnostic search spaces defined per estimator (tree / linear / kernel / ensemble)
  • Fitness evaluation via cross_val_score with configurable CV folds and scoring metric
  • Constraint-aware tuning for Logistic Regression (penalty→solver compatibility enforced during init/crossover/mutation)
  • Experiment harness to compare GA vs GridSearchCV vs RandomizedSearchCV under similar evaluation budgets

What I Owned End-to-End

  • Core GA implementation + reusable API design (scikit-learn style class wrapper)
  • Constraint-handling strategy for dependent hyperparameters (Logistic Regression)
  • Fair-budget comparison methodology (iterations aligned across tuners)
  • Results synthesis: accuracy/runtime impact and failure cases (where grid search became infeasible)

Technical Highlights

Architecture Decisions

  • Reusable estimator-style GA tuner (parameter-space in, best_params_/best_estimator_ out)
  • Fitness computed with 5-fold CV; optional parallelism via n_jobs
  • History tracking per generation (best/avg fitness + best params) for analysis and debugging

Algorithms / Protocols / Constraints

  • Tournament selection for parent choice
  • Uniform crossover (per-parameter mixing)
  • Mutation by resampling from the original search space
  • Elitism to preserve the best individual across generations
  • Constraint-aware GA variant for Logistic Regression to avoid invalid solver/penalty combos

Optimization Strategies

  • Comparable-budget evaluation (random search iterations aligned with pop_size × generations)
  • Parallel-capable fitness scoring (n_jobs) to reduce wall-clock time on multi-core machines

Tech Stack

Pythonscikit-learnNumPypandasJupyter

Results / Learnings

What Worked

  • Decision Tree: GA reached 0.9569 accuracy in 9.39s vs Grid Search 0.9692 in 1585.29s
  • SVM: GA reached 0.9789 accuracy in 1.78s vs Grid Search 0.9824 in 111.00s
  • Logistic Regression: GA matched Grid Search at 0.9780 accuracy (9.94s vs 212.77s)
  • Random Forest: GA achieved 0.9708 accuracy (grid search not completed after >2 days)

What I Learned

  • GA can be a strong tradeoff when grid search is prohibitively slow but you still want guided exploration
  • Small accuracy differences often cost orders of magnitude more runtime; the right tuner depends on constraints
  • Constraint handling is essential for reusable tuners (dependent hyperparameters otherwise waste most evaluations)

Tradeoffs Considered

  • GA runtime can exceed randomized search for similar accuracy, depending on population/generation settings
  • GA introduces its own hyperparameters (population size, mutation rate, crossover probability) that influence outcomes
  • Some model spaces remain expensive to evaluate because the fitness function is CV itself