How does scheduling work in Azure Machine Learning?
In Azure Machine Learning one runs experiments to train or score a model, these experiments can be run separately or within a string of steps, called a pipeline. This pipeline can then be scheduled to run on either an event based schedule or time based. Therefore to schedule a model training or scoring we need to: create a pipeline, publish the pipeline, schedule the published pipeline
Publishing a pipeline in Azure ML
To schedule a pipeline you need to first publish a pipeline. In this example I will publish a simple one step pipeline, I will also assume that you already have your compute and environment setup in Azure ML.
from azureml.core import Experiment, Environment, RunConfiguration, Workspace from azureml.pipeline.steps import PythonScriptStep from azureml.pipeline.core import Pipeline #Define variables experiment_name = "My experiment" compute_name = "my-compute" train_source_dir = "./code" train_entry_point = "train.py" environment_name = "training" pipeline_name = "My pipeline" pipeline_desc = "Pipeline to train my model" pipeline_ver = 1.0 #Connect to workspace ws = Workspace.from_config() #Define the environment and compute env = Environment.get(workspace=ws, name=environment_name) compute_target = ws.compute_targets[compute_name] #Create the run configuration run_config = RunConfiguration() run_config.target = compute_target run_config.environment = env #Create the single step main_step = PythonScriptStep( script_name=train_entry_point, source_directory=train_source_dir, compute_target=compute_target, runconfig=run_config, allow_reuse=False ) steps = [main_step] #Create the pipeline object pipeline1 = Pipeline(workspace=ws, steps=[steps]) #Create the experiment run for the pipeline pipeline_run1 = Experiment(ws, experiment_name).submit(pipeline1) #Publish this pipeline experiment to be scheduled and used again published_pipeline1 = pipeline_run1.publish_pipeline( name=pipeline_name, description=pipeline_desc, version=pipeline_ver)
Scheduling a pipeline in Azure ML
Now that the pipeline is published we can now schedule it. In this example I am going to be scheduling it on a time based schedule, however it is also possible to create an event based schedule as well.
from azureml.pipeline.core import Pipeline, PublishedPipeline from azureml.pipeline.core.schedule import ScheduleRecurrence, Schedule from azureml.core import Workspace #Define variables experiment_name = "My experiment" pipeline_name = "My pipeline" schedule_name = "My schedule" schedule_desc = "Weekly schedule" #Define a weekly schedule every Monday at 00:00 schedule_frequency = "Week" schedule_interval = 1 schedule_week_days = ["Monday"] schedule_time_of_day = "00:00" #Connect to the workspace ws = Workspace.from_config() #Find the pipeline ID we want to schedule using the name for pipeline in PublishedPipeline.list(ws): if pipeline.name == pipeline_name: pipeline_id = pipeline.id #Create the schedule recurrence recurrence = ScheduleRecurrence( frequency=schedule_frequency, interval=schedule_interval, week_days=schedule_week_days, time_of_day=schedule_time_of_day) #Create the schedule recurring_schedule = Schedule.create( ws, name=schedule_name, description=schedule_desc, pipeline_id=pipeline_id, experiment_name=experiment_name, recurrence=recurrence)
Change schedule of published pipeline in Azure ML
Once the pipeline has been scheduled it is possible to edit the schedule, this is done by editing your pipeline schedule script and running this separately from the publish pipeline script. This way you create a new schedule without changing the contents of the pipeline itself.
Scheduling in DevOps vs Azure ML
It is possible to create CI/CD pipelines in Azure DevOps which act similar to pipelines in Azure ML, whereby an experiment is created on a timed or event based schedule within Azure DevOps. Whilst possible, this is not recommended as this goes against the principles of typical MLOPS. It is better to instead use your CI/CD pipelines to create and schedule your pipelines within Azure ML when new code is pushed to the main branch.