The Azure platform provides a range of different options which can be mixed and matched based on the specific needs of your use case. There are two main categories: Azure Machine Learning (Azure ML) and Azure Databricks. Both have unique features and can accommodate different types of machine learning workloads.
Azure Machine Learning
Azure Machine Learning is a cloud-based environment for training, deploying, automating, managing, and tracking ML models. It offers an end-to-end machine learning platform to enable data scientists and developers to build, train, and deploy machine learning models faster.
Features of Azure ML include:
- Built-in notebooks for code execution.
- Support for both code-first and no-code experiences.
- Distributed training of deep learning models.
- Automated machine learning capability which accelerates model development.
- MLOps capabilities for managing the entire ML lifecycle.
Here is an example how we can train a simple linear regression model using Azure Machine Learning.
from azureml.core import Workspace, Experiment, Run
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
# access workspace
ws = Workspace.from_config()
exp = Experiment(workspace=ws, name="linear_exp")
# start run logger
run = exp.start_logging()
# load dataset
data = *
# split dataset
train_x, test_x, train_y, test_y = train_test_split(*)
# train model
lin_model = LinearRegression()
lin_model.fit(train_x, train_y)
# test model
pred_y = lin_model.predict(test_x)
mse = mean_squared_error(test_y, pred_y)
# end run logger
run.log("mse", mse)
run.complete()
Azure Databricks
Azure Databricks is an Apache Spark-based analytics platform optimized for the Microsoft Azure cloud services platform. It provides a collaborative environment for data scientists, engineers, and business analysts to work together.
Features of Azure Databricks include:
- A fully managed Apache Spark environment.
- Collaborative notebooks.
- One-click set up & elastic scale.
- Integrated workflows (for building, training and deploying models).
- Advanced security and governance.
In terms of machine learning workload, Databricks is particularly effective for big data processing tasks and running large-scale Spark jobs.
Below is an example of how you could use Azure Databricks to train a logistic regression model using Spark MLlib.
from pyspark.sql import SparkSession
from pyspark.ml.classification import LogisticRegression
from pyspark.ml.evaluation import BinaryClassificationEvaluator
# initialize SparkSession
spark = SparkSession.builder \
.appName("Logistic Regression") \
.getOrCreate()
# load data
df = spark.read.format("libsvm").load("*")
# split data for training and testing
(train_data, test_data) = df.randomSplit([0.7, 0.3])
# define and fit model
lr = LogisticRegression()
lr_model = lr.fit(train_data)
# Test model
predictions = lr_model.transform(test_data)
# evaluator
evaluator = BinaryClassificationEvaluator()
roc_auc = evaluator.evaluate(predictions)
print("ROC AUC: ", roc_auc)
Conclusion
To conclude, both Azure ML and Azure Databricks have their strengths and roles within Azure ecosystem. The best one for a given use-case would depend on the project requirements and constraints. A good machine learning engineer understands the capabilities of each, and knows how to use them effectively.
During the Microsoft Azure DP-100 exam, you might be asked to select the most appropriate machine learning environment for a given scenario. A thorough understanding of the capabilities each environment offers will be key for selecting the right one. Remember, the choice of environment is not just a technical decision, but also an economic one as it can have a significant cost impact. Utilizing the options within Azure effectively can help in optimizing costs while fulfilling all technical requirements.
Practice Test
Azure Batch AI is used to deploy models as a web service.
- True
- False
Answer: False
Explanation: Azure Batch AI is used for training and testing machine learning models at cloud scale, while Azure ML service is used for deploying machine learning models as a web service.
In Azure Machine Learning, you can build, train, and deploy models in any Python environment.
- True
- False
Answer: True
Explanation: Azure Machine Learning supports open-source technologies and thus, models can be build, trained, and deployed in any Python environment.
When choosing an environment for a Machine Learning use case, one needs to consider the cost implications.
- True
- False
Answer: True
Explanation: Cost is a key factor to consider when selecting an environment for a machine learning use case. The choice of environment may affect the cost of computation, storage and data transfer.
Azure Machine Learning is a Microsoft Azure product that you can use to train, deploy, automate, manage, and track ML models.
- True
- False
Answer: True
Explanation: Azure Machine Learning provides a complete set of services that developers and data scientists can use to develop, train, and deploy machine learning models.
Azure Machine Learning service does not support both code-first and drag-drop workflows for building models.
- True
- False
Answer: False
Explanation: Azure Machine Learning service supports both code-first and drag-drop workflows for building models.
Which Azure service enables real-time analytics and complex event processing on streaming data?
- Azure Stream Analytics
- Azure Data Factory
- Azure Machine Learning
- Azure Data Lake Storage
Answer: Azure Stream Analytics
Explanation: Azure Stream Analytics is a real-time analytics and complex event-processing engine that is designed to analyze and visualize streaming data in real-time.
Azure Machine Learning Studio is a cloud-based drag-and-drop tool that you can use to build, test, and deploy predictive analytics solutions.
- True
- False
Answer: True
Explanation: Azure Machine Learning Studio is an interactive, visual workspace to easily build, test, and iterate on a predictive analysis model.
Which of the following Azure services is used for batch scoring and inferencing?
- Azure Machine Learning Studio
- Azure Batch AI
- Azure Data Factory
- All of these
Answer: Azure Batch AI
Explanation: Azure Batch AI is used for batch scoring and inferencing, which allows you to test your models against large volumes of data.
Using Azure Machine Learning service, you can use any Python machine learning library for building models.
- True
- False
Answer: True
Explanation: Azure Machine Learning service fully supports open-source technologies, hence you can use any Python machine learning library for building models.
The suitable environment for machine learning use case does not depend on the volume and velocity of the data handled.
- True
- False
Answer: False
Explanation: The volume and velocity of data is a significant parameter to consider when selecting an environment for a machine learning use case. The choice might vary based on the data size and speed of data flow.
Interview Questions
What are the three critical factors to consider while choosing an environment for a machine learning use case in Azure?
The three critical factors include data privacy and regulations, scalability and speed of the models, and availability of appropriate tools and resources in the environment.
Does Azure Machine Learning provide support for open-source technologies?
Yes, Azure Machine Learning provides robust support for popular open-source technologies like Python, PyTorch, TensorFlow, and SciKit-Learn.
Name one advantage of the Managed Compute Infrastructure offered by Azure?
The Managed Compute Infrastructure offered by Azure provides scaling capabilities that can be used to manage and scale Machine Learning cross-validation and hyperparameter tuning.
How does Azure Machine Learning fit in the context of regulatory compliance?
Azure Machine Learning holds more than 50 compliance certifications, including HIPAA and GDPR. Therefore, it can be leveraged for use cases where data privacy and regulations are crucial.
What should be considered when determining the cost-effectiveness of deploying Machine Learning models in Azure?
Things to consider include compute resource usage, data storage demands, the number of function calls to the API endpoints, and network data transfer costs.
What is the role of Azure Machine Learning Studio?
Azure Machine Learning Studio provides a visual interface for developing, training, and deploying machine learning models. Its drag-and-drop interface especially benefits those with limited programming skills.
Can Azure Machine Learning be integrated with other Azure services?
Yes, Azure Machine Learning can be integrated with several Azure services such as Azure Databricks for big data processing and Azure IoT Edge for deploying machine learning models on IoT devices.
How does Azure Machine Learning support model interpretability?
Azure Machine Learning offers Model Interpretability Toolkit empowering data scientists to understand how models are behaving and why certain predictions are being made.
Can you create and deploy machine learning models on Azure without any coding?
Yes, you can use the Azure Machine Learning Designer, a drag-and-drop tool, to create your experiments and deploy models without writing any code.
What is an Azure Machine Learning workspace?
An Azure Machine Learning workspace is a fundamental resource for machine learning in Azure. It provides a centralized place to work with all the artifacts created during your experimentation.
How can Azure Machine Learning help with model management?
Azure Machine Learning helps with model management by providing robust model versioning, tracking, and monitoring capabilities.
What provision does Azure provide for training large models?
Azure provides distributed deep learning, which allows users to train large models across various computing clusters, speeding up the process.