In order to carry out tasks on Azure, one needs to understand how to configure the job run settings for a script.
This process mainly involves determining the environment in which the script will run and the resources that are required for its execution. In this article, we delve further into the topic, as it is an integral part of the DP-100 Designing and Implementing a Data Science Solution on Azure exam.
Creating an Azure ML Environment
The running environment of a script in Azure is defined by an Azure ML Environment
. An environment consists of a Python package, a Docker container image, and a set of specifications for the resources required. It is possible to create and manage custom environments or utilize the pre-built ones provided by Azure.
from azureml.core import Environment
# create a new environment
my_env = Environment(name="my_env")
# specify Python packages
my_env.python.conda_dependencies.add_pip_package("numpy==1.17.0")
The pre-built environments provided by Azure are managed environments that come with various popular data science packages.
Compute Targets
A Compute Target
in Azure is the resource where the training script is run. It could be a local machine or a cloud-based resource such as Azure Machine Learning Compute (a managed service), Azure Databricks, or Azure Kubernetes Service (AKS).
Creating and managing an Azure Machine Learning Compute, for instance, may involve:
from azureml.core.compute import ComputeTarget, AmlCompute
# specify the configuration (name and type of VM, maximum nodes, etc.)
provisioning_config = AmlCompute.provisioning_configuration(vm_size="STANDARD_D1_V2", max_nodes=4)
# create the cluster
compute_target = ComputeTarget.create(ws, "my_cluster", provisioning_config)
Run Configuration
The Run Configuration
brings together the compute target and the environment. It predetermines how the script is run, which includes the environment, script parameters, and other necessary details.
A run configuration can be defined as follows:
from azureml.core.runconfig import RunConfiguration
# create a new runconfig object
run_config = RunConfiguration()
# set the compute target
run_config.target = "my_cluster"
# set the environment
run_config.environment = my_env
Script Run Configuration
Script Run Configuration
then merges everything to create an executable job for the script. It includes the run configuration, script file, arguments, framework-specific configurations, and dataset bindings.
from azureml.core import ScriptRunConfig
# specify the script file and arguments
src = ScriptRunConfig(source_directory="scripts",
script="train.py",
arguments=["--data-folder", dataset.as_named_input("input").as_mount(),
"--regularization", 0.5],
run_config=run_config)
With pre-determined constraints with which the script runs, you can effectively manage and control the performance and cost of your Azure operations. This becomes increasingly vital as you handle larger datasets and more complex computations.
Understanding these components and being able to implement them effectively plays a crucial role in preparing for the DP-100 Designing and Implementing a Data Science Solution on Azure exam.
Practice Test
True or False: You cannot submit a script to an Azure ML Compute target from Jupyter Notebook.
- True
- False
Answer: False
Explanation: You can submit a script to an Azure ML Compute target from Jupyter Notebook. The notebook’s role is to orchestrate the resources and process the results while the script runs remotely.
True or False: When working with job run configuration in Azure ML, the environment should include every package that your training script depends on.
- True
- False
Answer: True
Explanation: The environment is part of the run configuration and should include every Python package that your training script depends on.
What does a PythonScriptStep do in Azure ML?
- A. Deploys a Webservice
- B. Enables you to run Python scripts as a step
- C. Creates a notebook
- D. Cleans your data
Answer: B. Enables you to run Python scripts as a step
Explanation: PythonScriptStep is used for running Python scripts as a part of a pipeline in azure ML.
The run setting for a job requires the reference of a Compute target. What does this compute target refers to?
- A. The type of code you are writing
- B. The machine learning model you have implemented
- C. The compute resource to run the script on
- D. The data you are using for training
Answer: C. The compute resource to run the script on
Explanation: The compute target in Azure ML refers to the compute resource to run the script on.
True or False: You may need to specify a run configuration for each script that submits a job.
- True
- False
Answer: True
Explanation: Yes, the run configuration contains the set of configuration settings for a particular script run.
True or False: You cannot run a script on a local machine using Azure ML.
- True
- False
Answer: False
Explanation: Through Azure ML, it is possible to run a script on either a local machine or remote compute.
What does the definition of an experiment involve?
- A. The script to run
- B. A compute target
- C. An environment of Python packages
- D. All of the above
Answer: D. All of the above
Explanation: The definition of an experiment involves the identification of the script to be run, a compute target, as well as a run setting containing the Python environment’s configuration settings.
True or False: The environment you use to run a script in Azure ML should include all necessary Python packages.
- True
- False
Answer: True
Explanation: It is necessary to ensure that the environment includes all necessary Python packages for the script to run successfully.
What does a source directory for a ScriptRunConfig contain?
- A. The script to run
- B. Other files used by the script
- C. Both A and B
- D. None
Answer: C. Both A and B
Explanation: The source directory specified in a ScriptRunConfig would contain not only the script to be executed but also any other files that this script would use.
True or False: An Experiment in Azure ML might have multiple runs.
- True
- False
Answer: True
Explanation: An Experiment can host multiple runs, each one with its specific configuration.
Interview Questions
What service in Azure is used to configure job run settings for a script?
Azure Machine Learning service is used to configure job run settings for a script.
In Azure ML, which object would you need to create to specify the configuration details for running your training script?
In Azure ML, we need to create an Estimator object to specify the configuration details for running our training script.
What can be used to directly run a script on Azure Machine Learning?
ScriptRunConfig can be used to directly run a script on Azure Machine Learning.
How can you specify the Python environment for script execution in Azure ML?
We can specify the Python environment for script execution using the environment parameter of the ScriptRunConfig.
Can you use custom Docker images for running scripts in Azure Machine Learning?
Yes, custom Docker images can be used for running scripts in Azure Machine Learning by specifying them in the environment object.
What is the purpose of the “compute_target” parameter in Azure ML’s ScriptRunConfig?
The “compute_target” parameter is used to specify the compute resource where the script will be run.
Can you monitor job status on Azure Machine Learning Service?
Yes, the job status can be monitored in the Azure Machine Learning Studio. Also, the ‘wait_for_completion’ method can be used to monitor the status of a running job.
What are the key components of the ScriptRunConfig object in Azure ML?
The key components of the ScriptRunConfig object are the script file name, command line arguments, compute target, and environment details.
How can you specify additional packages to be installed in the environment where script runs in Azure ML?
Additional packages can be specified via the CondaDependencies object in the environment where the script runs.
Can you cancel a running job in Azure Machine Learning Service?
Yes, a running job can be cancelled using ‘cancel’ method of the submitted run object.
What function would you use to submit a job configured by ScriptRunConfig?
The “submit” function of the Experiment object is used to submit a job in Azure ML.
Can you provide command-line arguments to a script in Azure ML?
Yes, command line arguments can be provided, which will be included in the ‘arguments’ parameter of the ScriptRunConfig object.
How can you run an Azure ML script directly without creating an estimator?
An Azure ML script can be run directly without creating an estimator by using the ScriptRunConfig object.
What happens when a job completes in Azure Machine Learning?
When a job completes in Azure Machine Learning, all data logged or stored in the outputs or logs directory is uploaded to the run record in Azure Machine Learning Studio.
How do you specify the location for a script file in Azure Machine Learning?
The directory that contains your scripts is specified as the source_directory in the ScriptRunConfig object.