Lab 7-04: MLflow Experiment Tracking

Learning Goals

Track ML experiments with MLflow: parameters, metrics, and artifacts
Compare runs across experiments
Register and version models in the MLflow Model Registry
Use mlflow.pytorch.autolog() for zero-boilerplate tracking

Core Concepts

MLflow Architecture

MLflow Tracking Server
    ├── Experiments (logical grouping)
    │       └── Runs (one training run)
    │               ├── Parameters  (hyperparameters)
    │               ├── Metrics     (loss, accuracy per step)
    │               └── Artifacts   (model weights, plots, code)
    └── Model Registry
            └── Registered Models
                    └── Versions (staging → production)

Basic Usage

import mlflow

mlflow.set_experiment("chest_xray_classifier")

with mlflow.start_run(run_name="densenet121_v3") as run:
    # Log hyperparameters (logged once)
    mlflow.log_params({
        "model": "DenseNet121",
        "optimizer": "Adam",
        "lr": 1e-4,
        "batch_size": 32,
        "epochs": 50,
    })

    for epoch in range(epochs):
        train_loss = train_one_epoch(...)
        val_auc = evaluate(...)

        # Log metrics per step
        mlflow.log_metrics({
            "train_loss": train_loss,
            "val_auc": val_auc,
        }, step=epoch)

    # Log trained model
    mlflow.pytorch.log_model(
        model,
        artifact_path="model",
        registered_model_name="chest_xray_classifier",
    )

    # Log any file as artifact
    mlflow.log_artifact("outputs/roc_curve.png")

print(f"Run ID: {run.info.run_id}")

Model Registry Workflow

from mlflow.tracking import MlflowClient

client = MlflowClient()

# Transition to staging
client.transition_model_version_stage(
    name="chest_xray_classifier",
    version=3,
    stage="Staging",
)

# Load model for inference
model = mlflow.pytorch.load_model(
    "models:/chest_xray_classifier/Staging"
)

Autolog

mlflow.pytorch.autolog(
    log_every_n_epoch=1,
    log_models=True,
    checkpoint=False,    # don't log every checkpoint
)
# Now just train — MLflow captures everything automatically
trainer.fit(model, dataloader)

Q: What's the difference between log_param and log_metric?
A: Parameters are static hyperparameters logged once (learning rate, model architecture). Metrics are time-series values logged per step/epoch (loss, accuracy). MLflow stores metrics with a step index so you can plot them over training.

Q: How do you compare 10 runs and find the best model?
A: Use client.search_runs(experiment_ids=["1"], order_by=["metrics.val_auc DESC"], max_results=10). This returns runs sorted by validation AUC. You can also use the MLflow UI at mlflow ui --port 5000.

Q: What's the Model Registry used for?
A: It provides a governance layer: models move through stages (None → Staging → Production → Archived). This enables CI/CD for ML — automated tests must pass before a model moves to Production. Multiple teams can see what's deployed without digging through run IDs.

AI Engineer — Role-Based Learning Hub