Tracking the training of machine learning model is an important part of the model development lifecycle. One of the leading tools that make this process effortless is MLflow, an open-source platform that manages the machine learning lifecycle, including experimentation, reproducibility, deployment, and a central model registry.
In this article, we will delve into the specifics of how the use of MLflow can assist in tracking model training, especially in the context of the DP-100 “Designing and Implementing a Data Science Solution on Azure” exam.
What is MLflow?
MLflow is designed to work with any machine learning library and algorithm, manage the entire lifecycle of machine learning, including experimentation, reproducibility and deployment, and consolidate all these aspects in a single user interface. It is divided into four components, which includes MLflow Tracking of which this discussion will be primarily based on.
MLflow Tracking
MLflow Tracking is one of the components of MLflow which allows you to record and query experiments, including metadata (like parameters and the name of the user who launched the experiment), the state of the system (like Git commit hash and the executing machine), and the results.
Here is a example of how to use MLflow Tracking in Python:
import mlflow
# Begin a new run
mlflow.start_run()
# Log a parameter (key-value pair)
mlflow.log_param(“param1”, 5)
# Log a metric; metrics can be updated throughout the run
mlflow.log_metric(“foo”, 1)
mlflow.log_metric(“foo”, 2)
mlflow.log_metric(“foo”, 3)
# Log an artifact (output file)
with open(“output.txt”, “w”) as f:
f.write(“Hello, world!”)
mlflow.log_artifact(“output.txt”)
# End the run
mlflow.end_run()
This logs parameters, metrics, and an artifact of an ML model in a structured way.
Using MLflow with Azure
You can use Azure Machine Learning to track your runs in the cloud and access them at scale. You use the Azure Machine Learning SDK to log metrics and upload output to Azure Machine Learning. Then, you can use the Azure Machine Learning studio to view and organize your runs and outputs.
Here’s an example of how to use the Azure Machine Learning SDK within an MLflow run to log metrics:
from azureml.core import Run
import mlflow
import mlflow.sklearn
# start a training run by defining an experiment name
with mlflow.start_run(experiment_id=
# Log a parameter (key-value pair)
mlflow.log_param(“num_dimensions”, 8)
# Log a metric; metrics can be updated throughout the run
mlflow.log_metric(“accuracy”, 0.95)
mlflow.log_metric(“precision”, 0.90)
mlflow.log_metric(“recall”, 0.88)
# log the model using the sklearn format
mlflow.sklearn.log_model(lr, “model”)
run = Run.get_context()
run.log(“Accuracy”, 0.95)
# end the run
mlflow.end_run()
This example demonstrates how you can start an MLflow run, log parameters and metrics to Azure Machine Learning, and then log a trained scikit-learn model in MLflow’s model format.
In conclusion, managing and tracking model training benefits from efficient tools such as MLflow. With MLflow Tracking, it is much easier to record and query experiments: code, data, config, and results, among others. These capabilities make the preparation for exams such as DP-100 “Designing and Implementing a Data Science Solution on Azure” simpler and more productive, given the hands-on skills acquired in tracking machine learning model training.
Practice Test
True or False: MLflow is a tool that aids in managing the Machine Learning lifecycle, including tracking of model training.
- True
- False
Answer: True
Explanation: MLflow is an open source platform used to manage the ML lifecycle, including experimentation, model tracking, deployment, and serving.
MLflow can be used in which of the following cloud platforms?
- A. Google Cloud
- B. AWS
- C. Azure
- D. All of the above
Answer: D. All of the above
Explanation: MLflow is platform agnostic and can be used across various cloud platforms, including Google Cloud, AWS, and Azure.
True or False: MLFlow only works with Python programming language.
- True
- False
Answer: False
Explanation: MLflow is language-agnostic. It not only works with Python, but also with Java, R and other programming languages.
In MLflow, the Metrics functionality is used to:
- A. Store parameters that do not change the model training process
- B. Store measurable parameters that change during model training
- C. Store model cost
- D. None of the above
Answer: B. Store measurable parameters that change during model training
Explanation: Metrics in MLflow are for tracking measurable parameters that change during the model training process, such as accuracy and loss values.
True or False: In Azure, we can integrate MLflow with Azure Machine Learning.
- True
- False
Answer: True
Explanation: Azure Machine Learning supports MLflow integration which makes it easy for developers to track models and metrics.
Which component of MLflow would you use to share models across different programming languages?
- A. MLflow Projects
- B. MLflow Tracking
- C. MLflow Models
- D. MLflow Server
Answer: C. MLflow Models
Explanation: MLflow Models is a packaging format for sharing and distributing models across different programming languages.
MLflow is used to:
- A. deploy machine learning models
- B. track machine learning model training
- C. package and share machine learning models
- D. All of the above
Answer: D. All of the above
Explanation: MLflow supports model deployment, tracking, as well as packaging and sharing of ML models.
True or False: In Azure, you cannot use the MLflow UI to explore the tracked run details.
- True
- False
Answer: False
Explanation: The MLflow UI can be used in Azure to explore tracked run details, including parameters, metrics, tags, and artifacts.
MLflow Tracking is used for:
- A. Recording model parameters only
- B. Storing final model performance metrics only
- C. Tracking experiments and recording parameters and metrics
- D. None of the above
Answer: C. Tracking experiments and recording parameters and metrics
Explanation: MLflow Tracking is used for both recording model parameters as well as storing different metrics of model performance.
True or False: MLFlow does not support logging of custom artifacts like images or data frames.
- True
- False
Answer: False
Explanation: MLFlow supports logging of custom artifacts, which can include anything from images, data frames to whole directories.
Interview Questions
What is MLflow?
MLflow is an open-source platform to manage the Machine Learning lifecycle, that includes experimentation, reproducibility, and deployment. It is platform agnostic and can be used with any machine learning library, algorithm, deployment tool or language.
How does MLflow help in tracking model training?
MLflow provides extensive capabilities to log and query experiments. Its Tracking component is an API and UI for logging parameters, code versions, metrics, and output files when running your machine learning code and for later visualizing the results. This way, it keeps detailed tracking of model training phases.
What are the main components of MLflow?
The main components of MLflow are: MLflow Tracking, MLflow Projects, MLflow Models, and MLflow Registry.
What is the role of MLflow Tracking in model training?
MLflow Tracking is used to record and query experiments; this includes logging parameters, code versions, metrics, and output files in machine learning code and visualizing results.
How is Azure related to MLflow?
MLflow can be used on Azure as part of Azure Machine Learning service. The Azure Machine Learning service expands on MLflow’s capabilities to track metrics, parameters, and artifacts of the machine learning model in Azure.
Can we use MLflow with any Machine learning library and language?
Yes, MLflow is designed to work with any machine learning library and language.
How can MLflow promote experiment reproducibility?
MLflow can log all aspects of a machine learning model training session, including parameters, code versions, and results. This allows for easier experiment replication and comparison, and facilitates collaboration among multiple data scientists.
“DP-100 Designing and Implementing a Data Science Solution on Azure” includes MLflow. Can you share why?
This is because Azure supports MLflow as part of its Machine Learning services. The integration allows data scientists to use the comprehensive capabilities of MLflow right within Azure.
What is the role of MLflow Models?
MLflow Models is a convention for packaging machine learning models in multiple formats called flavours. These can be easily used for inference via various tools and frameworks.
What is the MLflow Registry used for?
MLflow Registry is used for central model repository with the ability to transition a model’s stages, annotate them with descriptions, and more. This helps in managing collaborative model development across the data science team.
Can MLflow integrate with deployment tools?
Yes, MLflow is designed to integrate with popular deployment tools, making it easier for data scientists to put their models into production.
How can MLflow assist with version control in machine learning training models?
MLflow allows data scientists to keep track of different versions of their models, along with the parameters and metrics for each run. This helps maintain a history of model development and allows for easy comparison between different models.
In terms of MLflow, what is an “experiment”?
In terms of MLflow, an “experiment” refers to a specific machine learning task. Each experiment contains a collection of “runs”, which are individual program executions, such as single model training sessions.
How does MLflow’s UI support model tracking?
MLflow’s UI allows data scientists to visualize the parameters, metrics and artifacts for each run in an experiment. It provides a comprehensive dashboard to track and compare the performance of different models.
Is MLflow supported on all cloud platforms?
Yes, MLflow is a cloud-agile platform and can be deployed on any cloud platform, including Azure. It’s designed to work effectively across multiple cloud environments.