Lab 7-04: MLflow Experiment Tracking
Learning Goals
- Track ML experiments with MLflow: parameters, metrics, and artifacts
- Compare runs across experiments
- Register and version models in the MLflow Model Registry
- Use
mlflow.pytorch.autolog()for zero-boilerplate tracking
Core Concepts
MLflow Architecture
MLflow Tracking Server
├── Experiments (logical grouping)
│ └── Runs (one training run)
│ ├── Parameters (hyperparameters)
│ ├── Metrics (loss, accuracy per step)
│ └── Artifacts (model weights, plots, code)
└── Model Registry
└── Registered Models
└── Versions (staging → production)
Basic Usage
import mlflow
mlflow.set_experiment("chest_xray_classifier")
with mlflow.start_run(run_name="densenet121_v3") as run:
# Log hyperparameters (logged once)
mlflow.log_params({
"model": "DenseNet121",
"optimizer": "Adam",
"lr": 1e-4,
"batch_size": 32,
"epochs": 50,
})
for epoch in range(epochs):
train_loss = train_one_epoch(...)
val_auc = evaluate(...)
# Log metrics per step
mlflow.log_metrics({
"train_loss": train_loss,
"val_auc": val_auc,
}, step=epoch)
# Log trained model
mlflow.pytorch.log_model(
model,
artifact_path="model",
registered_model_name="chest_xray_classifier",
)
# Log any file as artifact
mlflow.log_artifact("outputs/roc_curve.png")
print(f"Run ID: {run.info.run_id}")
Model Registry Workflow
from mlflow.tracking import MlflowClient
client = MlflowClient()
# Transition to staging
client.transition_model_version_stage(
name="chest_xray_classifier",
version=3,
stage="Staging",
)
# Load model for inference
model = mlflow.pytorch.load_model(
"models:/chest_xray_classifier/Staging"
)
Autolog
mlflow.pytorch.autolog(
log_every_n_epoch=1,
log_models=True,
checkpoint=False, # don't log every checkpoint
)
# Now just train — MLflow captures everything automatically
trainer.fit(model, dataloader)
Interview Questions
Q: What's the difference between log_param and log_metric?
A: Parameters are static hyperparameters logged once (learning rate, model architecture). Metrics are time-series values logged per step/epoch (loss, accuracy). MLflow stores metrics with a step index so you can plot them over training.
Q: How do you compare 10 runs and find the best model?
A: Use client.search_runs(experiment_ids=["1"], order_by=["metrics.val_auc DESC"], max_results=10). This returns runs sorted by validation AUC. You can also use the MLflow UI at mlflow ui --port 5000.
Q: What's the Model Registry used for?
A: It provides a governance layer: models move through stages (None → Staging → Production → Archived). This enables CI/CD for ML — automated tests must pass before a model moves to Production. Multiple teams can see what's deployed without digging through run IDs.