Schedule pipelines and experiments in Azure Machine Learning

In Azure Machine Learning one runs experiments to train or score a model, these experiments can be run separately or within a string of steps, called a pipeline. This pipeline can then be scheduled to run on either an event based schedule or time based.

Stephen Allwright
Stephen Allwright

How does scheduling work in Azure Machine Learning?

In Azure Machine Learning one runs experiments to train or score a model, these experiments can be run separately or within a string of steps, called a pipeline. This pipeline can then be scheduled to run on either an event based schedule or time based. Therefore to schedule a model training or scoring we need to: create a pipeline, publish the pipeline, schedule the published pipeline

Publishing a pipeline in Azure ML

To schedule a pipeline you need to first publish a pipeline. In this example I will publish a simple one step pipeline, I will also assume that you already have your compute and environment setup in Azure ML.

from azureml.core import Experiment, Environment, RunConfiguration, Workspace
from azureml.pipeline.steps import PythonScriptStep
from azureml.pipeline.core import Pipeline

#Define variables
experiment_name = "My experiment"
compute_name = "my-compute"
train_source_dir = "./code"
train_entry_point = "train.py"
environment_name = "training"
pipeline_name = "My pipeline"
pipeline_desc = "Pipeline to train my model"
pipeline_ver = 1.0

#Connect to workspace
ws = Workspace.from_config()

#Define the environment and compute
env = Environment.get(workspace=ws, name=environment_name)
compute_target = ws.compute_targets[compute_name]

#Create the run configuration
run_config = RunConfiguration()
run_config.target = compute_target
run_config.environment = env

#Create the single step
main_step = PythonScriptStep(
    script_name=train_entry_point,
    source_directory=train_source_dir,
    compute_target=compute_target,
    runconfig=run_config,
    allow_reuse=False
)

steps = [main_step]

#Create the pipeline object
pipeline1 = Pipeline(workspace=ws, steps=[steps])

#Create the experiment run for the pipeline
pipeline_run1 = Experiment(ws, experiment_name).submit(pipeline1)

#Publish this pipeline experiment to be scheduled and used again
published_pipeline1 = pipeline_run1.publish_pipeline(
     name=pipeline_name,
     description=pipeline_desc,
     version=pipeline_ver)

Scheduling a pipeline in Azure ML

Now that the pipeline is published we can now schedule it. In this example I am going to be scheduling it on a time based schedule, however it is also possible to create an event based schedule as well.

from azureml.pipeline.core import Pipeline, PublishedPipeline
from azureml.pipeline.core.schedule import ScheduleRecurrence, Schedule
from azureml.core import Workspace

#Define variables
experiment_name = "My experiment"
pipeline_name = "My pipeline"
schedule_name = "My schedule"
schedule_desc = "Weekly schedule"

#Define a weekly schedule every Monday at 00:00
schedule_frequency = "Week"
schedule_interval = 1
schedule_week_days = ["Monday"]
schedule_time_of_day = "00:00"

#Connect to the workspace
ws = Workspace.from_config()

#Find the pipeline ID we want to schedule using the name
for pipeline in PublishedPipeline.list(ws):
    if pipeline.name == pipeline_name:
        pipeline_id = pipeline.id

#Create the schedule recurrence
recurrence = ScheduleRecurrence(
frequency=schedule_frequency,
interval=schedule_interval,
week_days=schedule_week_days,
time_of_day=schedule_time_of_day)

#Create the schedule
recurring_schedule = Schedule.create(
ws, name=schedule_name,
description=schedule_desc,
pipeline_id=pipeline_id,
experiment_name=experiment_name,
recurrence=recurrence)

Change schedule of published pipeline in Azure ML

Once the pipeline has been scheduled it is possible to edit the schedule, this is done by editing your pipeline schedule script and running this separately from the publish pipeline script. This way you create a new schedule without changing the contents of the pipeline itself.

Scheduling in DevOps vs Azure ML

It is possible to create CI/CD pipelines in Azure DevOps which act similar to pipelines in Azure ML, whereby an experiment is created on a timed or event based schedule within Azure DevOps. Whilst possible, this is not recommended as this goes against the principles of typical MLOPS. It is better to instead use your CI/CD pipelines to create and schedule your pipelines within Azure ML when new code is pushed to the main branch.

Register model in Azure Machine Learning
Deploy machine learning model to ACI in Azure Machine Learning
How to run an experiment in Azure Machine Learning

References

Scheduling documentation

Azure

Stephen Allwright Twitter

I'm a Data Scientist currently working for Oda, an online grocery retailer, in Oslo, Norway. These posts are my way of sharing some of the tips and tricks I've picked up along the way.

Comments