This article explores hyperparameter tuning best practices within a modern MLOps pipeline that integrates Hydra, Optuna, and MLflow, alongside DVC for reproducibility.
Introduction
This article explores hyperparameter tuning best practices within a modern MLOps pipeline that integrates Hydra, Optuna, and MLflow, alongside DVC for reproducibility. Two sample model configurations—configs/model_params/rf_optuna_trial_params.yaml and configs/model_params/ridge_optuna_trial_params.yaml—illustrate how parameter search spaces are defined and fed into Optuna. The resulting runs are tracked by MLflow, ensuring that each trial’s metrics and artifacts are documented. References to the relevant code under dependencies/modeling/ and configs/transformations/ demonstrate how tuning logic stays modular and consistent.
Hyperparameter tuning can significantly enhance model performance, but it must be approached systematically:
Search Space Design
Avoid overly broad or random parameter ranges. YAML files such as configs/model_params/rf_optuna_trial_params.yaml define specific low/high boundaries for each parameter, focusing Optuna’s search where it is most likely to yield improvements.
Systematic Search Strategy
Methods like optuna.trial.Trial.suggest_* leverage Bayesian or TPE sampling to converge faster than random or grid approaches. For instance, the utility script dependencies/modeling/optuna_random_search_util.py carefully maps YAML config entries into trial suggestions.
Robust Validation
Cross-validation or well-defined train/validation splits are specified in the config (for example, cv_splits: 5
). This structure, visible in dependencies/modeling/rf_optuna_trial.py and dependencies/modeling/ridge_optuna_trial.py, helps avoid overfitting to a single hold-out set.
Reproducibility
Hydra merges these tuning configs at runtime, while DVC tracks code and data lineage. Changes to either the code or the YAML parameters cause DVC to rerun only the affected pipeline stages.
Logging and Versioning
MLflow records each Optuna trial’s metrics and parameters—see logs/runs/2025-03-21_16-56-53/rf_optuna_trial.log for a record of how RMSE and $R^2$ were logged per trial. This centralized logging enables quick performance comparisons across parameter sets.
Configuration Management
The config folder includes specialized YAML files (e.g., configs/transformations/rf_optuna_trial.yaml) referencing configs/model_params/rf_optuna_trial_params.yaml. This consistent approach ensures each parameter is declared in one place and shared across training scripts.
CI/CD Integration
Because each stage is tracked in dvc.yaml, new commits automatically check if hyperparameter configs changed. If so, only those tuning stages re-run, significantly reducing compute time in a CI/CD setting.
Trial Concurrency
The pipeline config sets n_jobs_study
to define how many trials run in parallel. This concurrency is validated in dependencies/modeling/validate_parallelism.py to ensure stable concurrency settings.
Model-Specific Tuning
Each model has a distinct config: one for RandomForestRegressor
(configs/model_params/rf_optuna_trial_params.yaml) and another for Ridge (configs/model_params/ridge_optuna_trial_params.yaml). This clarity keeps the pipeline flexible for adding new algorithms.
Oversized Parameter Ranges
Scattering parameters from 1 to 1e6 can waste resources. In the provided config, each parameter has a carefully bounded range. For instance, n_estimators
is set between 100 and 1000.
Lack of Early Stopping/Pruning
Optuna’s pruners (Median or Hyperband) can halt unpromising trials early. While optional, ignoring such pruning can inflate compute costs.
Inconsistent Environments
Trials must run in identical environments with the same seeds, as done here via Hydra’s random_state: ${ml_experiments.rng_seed}
reference. Inconsistent seeds or library versions complicate reproducibility.
Missing Data Lineage
Failing to store the dataset version used for tuning can lead to confusion later. This project references data_version_input
in dvc.yaml and ensures all transformations are consistent.
Optuna’s core advantage is a flexible API for sampling hyperparameters:
Parameter Mapping
dependencies/modeling/optuna_random_search_util.py reads each parameter key in the YAML’s hyperparameters
section (e.g., min_samples_split
) and calls the appropriate suggest_*
method.
Study Object
Scripts like dependencies/modeling/rf_optuna_trial.py create a study = optuna.create_study(direction="minimize")
, then call study.optimize(objective, n_trials=n_trials)
. The best trial’s parameters are used for final model retraining.
Advanced Tuning
The same approach can incorporate pruners or custom samplers. The minimal example here uses cross-validation for objective scoring but can be extended to distributed settings.
MLflow primarily logs the results of each Optuna trial:
Experiment Naming
dependencies/modeling/rf_optuna_trial.py sets a unique experiment name, combining Hydra’s timestamp with a user-defined prefix.
Metrics & Parameters
Each trial logs RMSE and $R^2$, plus the final hyperparameters. The best trial is re-fitted, and its final metrics are recorded under a separate run named “final_model.”
Artifact Tracking
Extra files, such as permutation_importances.csv
, are logged as artifacts. Once the logging completes, local copies are removed to keep the workspace clean.
MLflow is not controlling the actual search loop—that falls to Optuna. Instead, it captures the entire history of parameter settings and performance, making it easy to compare different model families or runs.
Below is an excerpt from logs/runs/2025-03-21_16-56-53/rf_optuna_trial.log. The pipeline invoked rf_optuna_trial
with 2 trials, each generating an MLflow run:
[2025-03-21 17:12:00,654][dependencies.modeling.rf_optuna_trial] - Trial 0 => RMSE=764781.853 R2=0.849 (No best yet)
[2025-03-21 17:19:57,082][dependencies.modeling.rf_optuna_trial] - Trial 1 => RMSE=768083.292 R2=0.848 (Best so far=764781.853)
[2025-03-21 17:19:57,082][dependencies.modeling.rf_optuna_trial] - Training final model on best_params
The final model is trained on the best parameters from the two trials. MLflow logs all relevant data, ensuring any subsequent run or re-creation is straightforward.
This pipeline exemplifies how Hydra configs, Optuna searches, MLflow tracking, and DVC-based reproducibility combine to create a well-structured, scalable hyperparameter tuning process. Best practices include bounding parameter ranges, using robust validation, logging each trial, and maintaining data lineage. With these elements in place, teams can optimize model performance while preserving full traceability and efficient resource usage.