The activities in a pipeline define actions to perform on data. These pipelines can be scheduled to run at specific times via the Azure Data Factory or Azure Synapse Studio. The DP-100 exam, Designing and Implementing a Data Science Solution on Azure, requests knowledge of how to run and schedule these pipelines.
Defining and Running a Pipeline
Before running a pipeline, it needs to be defined. You can use the Azure Data Factory Copy Data tool or use JSON definitions to define your pipelines. An example of defining a pipeline using a JSON definition is as follows:
{
“name”: “MyPipeline”,
“properties”: {
“description”: “My pipeline”,
“activities”: [{
“name”: “CopyData”,
“type”: “Copy”,
“inputs”: [{
“referenceName”: ““,
“type”: “DatasetReference”
}],
“outputs”: [{
“referenceName”: “
With the pipeline defined, it can then be set to run either manually or automatically.
To run a pipeline manually via Azure Synapse Studio:
- Navigate to the ‘Author & Monitor’ hub and click on ‘+ Create pipeline’.
- Define or select your pipeline.
- Click on ‘Add Trigger’ and select ‘Trigger Now’.
Scheduling a Pipeline
For scheduling a pipeline to run automatically, a trigger is needed. A pipeline can be triggered to run at specific times, on specific days, or upon fulfillment of certain conditions.
Creating a schedule trigger via Azure Synapse Studio:
- Navigate to the ‘Author & Monitor’ hub and click on ‘+ Create trigger’.
- Enter your desired properties:
- Name: The name of the trigger.
- Start/End: The start and end dates/times for the trigger.
- Frequency: How often the trigger will fire (e.g., Minute, Hour, Day).
- Add your defined pipeline to the trigger with options for:
- Pipeline: The name of the pipeline to be triggered.
- Parameters: Any set parameters for the pipeline.
- Click on ‘Activate’ to start the trigger.
Here is an example of defining a schedule trigger as part of JSON definition:
{
“name”: “MyTrigger”,
“properties”: {
“runtimeState”: “Started”,
“pipeline”: {
“pipelineReference”: {
“referenceName”: “
“type”: “PipelineReference”
}
},
“type”: “ScheduleTrigger”,
“typeProperties”: {
“recurrence”: {
“frequency”: “Hour”,
“interval”: 1,
“startTime”: “2022-06-16T00:00:00Z”,
“timeZone”: “UTC”,
“schedule”: {
“minutes”: [30],
“hours”: [0, 13]
}
}
}
}
}
In addition to the regularly scheduled pipelines, event-based triggers can also be defined. These triggers will fire the pipeline when a specific event (such as creating or deleting a blob in Azure Blob Storage) occurs.
By defining, running, and scheduling pipelines, you can automate your data processes using Azure Data Factory or Azure Synapse Studio. This is an essential skill for those preparing for the DP-100 exam, as data professionals often need to automate the movement and transformation of data across various Azure services.
Practice Test
True or False: Azure Pipelines enables you to continuously build, test, and deploy to any platform and cloud.
- Answer: True
Explanation: Azure Pipelines is a cloud service that you can use to build and test your code project automatically and make it available to other users.
What can you do with Azure Pipelines? (Select all that are applicable)
- a) Build, test, and deploy with CI/CD that works with any language, platform, and cloud
- b) Deploy to Azure, Google Cloud, AWS or on-premises
- c) Create workflow that only supports Microsoft platforms
- d) Automatically schedule a pipeline run
Answer: A, B, D
Explanation: Azure Pipelines is a robust service that lets you automate the build, test, and deployment of your applications that work with any platform. You can choose to deploy to Azure, Google Cloud, AWS, or even on-premises. Scheduling a pipeline run is also possible using this service.
True or False: You cannot schedule a pipeline in Azure Pipelines.
- Answer: False
Explanation: You can schedule a pipeline in Azure Pipelines. This allows you to automatically run pipelines at the schedule you set.
Which types of pipeline triggers are available in Azure Pipelines? (Select all that are applicable)
- a) Continuous integration (CI) triggers
- b) Scheduled triggers
- c) Batch triggers
- d) Push triggers
Answer: A, B
Explanation: Azure Pipelines offers two types of pipeline triggers – continuous integration (CI) triggers and scheduled triggers.
True or False: Pipelines in Azure can only be run manually.
- Answer: False
Explanation: Azure allows you to run pipelines not only manually, but also on schedule or continuously depending on code changes.
Which is the YAML keyword used to define a scheduled trigger in Azure Pipelines?
- a) schedule
- b) trigger
- c) timeline
- d) routine
Answer: A
Explanation: The “schedule” keyword is used in YAML to define a scheduled trigger in Azure Pipelines.
True or False: Scheduling a pipeline in Azure Pipelines means the run will happen instantaneously.
- Answer: False
Explanation: Scheduling a pipeline in Azure Pipelines means that the run will happen at the scheduled time, not instantaneously.
Which Azure service allows you to set up the preconditions for running a pipeline?
- a) Azure Resource Manager
- b) Azure Key Vault
- c) Azure Monitor
- d) Azure Data Factory
Answer: D
Explanation: A pipeline run in Azure Data Factory can have preconditions set to define what is required before the run can occur.
True or False: A pipeline in Azure Pipelines can only be scheduled to run once a day.
- Answer: False
Explanation: A pipeline in Azure Pipelines can be scheduled to run multiple times a day, not just once.
Which programming languages does Azure Pipelines support? (Select all that are applicable)
- a) Python
- b) Ruby
- c) Java
- d) C#
Answer: A, B, C, D
Explanation: Azure Pipelines supports a wide range of languages including Python, Ruby, Java, and C#, among others.
Can non-Microsoft Source Systems utilize Azure Pipelines for continuous integration (CI)?
- Answer: Yes
Explanation: Azure Pipelines supports a variety of source systems, including GitHub, Bitbucket, Jenkins, and others, not just Microsoft-owned systems.
A ‘Deployment Group’ in Azure Pipelines is:
- a) A group of resources to which you wish to deploy
- b) A collection of pipeline runs
- c) A unit of scheduling
- d) A repository of source code
Answer: A
Explanation: A Deployment Group in Azure Pipelines is a logical set of deployment target machines that can host an application.
True or False: A pipeline in Azure Pipelines can only run after the previous run has completed.
- Answer: False
Explanation: Pipeline runs in Azure Pipelines are independent and a pipeline can be set up to run concurrently, it does not have to wait for the completion of a previous run.
What is the maximum duration for a single job in a pipeline in Azure Pipelines?
- a) 1 hour
- b) 6 hour
- c) 10 hour
- d) No time limit
Answer: B
Explanation: The maximum duration for a single job in Azure Pipelines is 6 hours.
What is a ‘Stage’ in Azure Pipelines?
- a) An environment where the code is built
- b) A pool of agents
- c) A phase of the DevOps workflow
- d) A collection of jobs
Answer: D
Explanation: A ‘Stage’, as referred to in Azure Pipelines, is essentially a collection of jobs. Typically, a stage in a pipeline represents an environment. Each stage can contain one or more jobs.
Interview Questions
What is a pipeline in the context of Azure ML?
In Azure ML, a pipeline is a workflow of machine learning tasks where each task is encapsulated as a step in the pipeline.
How do you create a pipeline in Azure ML?
You create a pipeline by instantiating a Pipeline object and passing it a list of steps (tasks) that you have defined. This list of steps is ordered and tasks are run in the order they are listed.
How frequently can pipelines be scheduled to run in Azure ML?
In Azure ML, pipelines can be scheduled to run at various frequencies ranging from once per minute to once per month.
Can a pipeline be run manually in Azure ML?
Yes, a pipeline can be run manually in Azure ML. This is done by first publishing the pipeline and then using the Azure ML SDK to invoke the pipeline’s run method.
What does it mean to ‘publish’ a pipeline in Azure ML?
Publishing a pipeline in Azure ML makes it available as a REST endpoint. This means the pipeline can be invoked from a variety of programming languages, not just those supported by the Azure ML SDK.
How do you schedule a pipeline to run at a specific time every day in Azure ML?
In Azure ML, pipelines are scheduled by specifying a cron expression. For example, to run a pipeline at 4 a.m. every day, you would use the following cron expression: “0 4 * * *”.
Can the output from one step in a pipeline be used as input in a subsequent step?
Yes, the output from one step in a pipeline can be used as input in a subsequent step. This is done by specifying the input to the subsequent step as the output of a prior step.
What is the recommended way to manage dependencies between steps in a pipeline in Azure ML?
Azure ML automatically manages dependencies between steps in a pipeline. This is done by specifying the input to a step as the output of the step it depends on.
What is the difference between a published pipeline and a pipeline draft in Azure ML?
A published pipeline in Azure ML is versioned and assigned a unique ID, and it can be invoked through the Azure ML REST API. A pipeline draft, on the other hand, is a pipeline that has been saved but not yet published.
What happens to a pipeline’s run history in Azure ML when the pipeline is updated and republished?
When a pipeline is updated and republished in Azure ML, the run history of the previous version remains intact. The new version of the pipeline will have its own separate run history.
What is the purpose of the PipelineData object in Azure ML?
The PipelineData object in Azure ML is a special kind of data reference that is used to manage intermediate data in pipelines. Data stored in a PipelineData object can be used as input or output for pipeline steps.
How do you ensure a step in a pipeline only runs when its inputs have changed?
In Azure ML, this can be achieved by setting the allow_reuse parameter to True when defining the step. This means the step will only run if the data has changed since the last run.
Can different steps in a pipeline run in parallel in Azure ML?
Yes, different steps in a pipeline can run in parallel if there are no dependencies between them. Azure ML automatically determines the best execution order and parallelization for the steps in a pipeline.