Set up CI/CD pipelines for machine learning in Azure DevOps

Continuous integration and continuous deployment (CI/CD) pipelines in Azure DevOps are an effective way of automatically training your machine learning models after new code has been pushed to a branch

Stephen Allwright

15 Nov 2020

Why you should use CI/CD pipelines in your ML infrastructure

Continuous integration and continuous deployment (CI/CD) pipelines in Azure DevOps are an effective way of automating your data science infrastructure, by automatically training your machine learning models after new code has been pushed to a branch. This results in faster deployment to production and the automation of previously manual tasks.

Setting up machine learning infrastructure is becoming increasingly important for data scientists and their projects. Thus, in this post I will explain how you can set up build pipelines in Azure DevOps to automatically train a machine learning model and register it within Azure Machine Learning.

Setting up infrastructure in DevOps to train your machine learning model

The goal for this pipeline is to do the following:

Trigger the build pipeline when code is pushed to the master branch
Run an experiment in Azure Machine Learning
Train and pickle the model in the experiment
Register the model for later use

This example assumes that you have the following:

A script which trains a machine learning model and pickles it into a given folder
A Machine Learning Workspace setup in Azure
A compute instance within your Azure Machine Learning Workspace

Creating the DevOps build pipeline with a YAML file

To create the build pipeline, navigate to the pipeline section of DevOps and create a 'starter build pipeline', hosted within your repository, and use the following YAML code.

#Build pipeline

#Define the branch which will trigger the pipeline
trigger: 
  branches:
    include:
      - master
  always: true

#Define the image to be used
pool:
  vmImage: "ubuntu-16.04"

steps:

#Set the python version
- task: UsePythonVersion@0
  inputs:
    versionSpec: '3.6'
  displayName: 'Setting Python version'

#Install the packages needed to run this pipeline
- script: |
	pip install azureml-sdk
  displayName: 'Install packages needed'

#Run the script which starts the experiment
- task: AzureCLI@2
  inputs:
    azureSubscription: 'your-azure-subscription'
    scriptType: 'bash'
    scriptLocation: 'inlineScript'
    inlineScript: 'python code/start_experiment.py'
    addSpnToEnvironment: true
  displayName: 'Run experiment'

Creating a Python script to train the machine learning model and register with Azure Machine Learning

The Python script which is called within the previously created build pipeline would look like this.

#code/start_experiment.py

from azureml.core import (
    Workspace,
    Experiment,
    Environment,
    RunConfiguration,
    ScriptRunConfig,
)

#Define the name of your created compute instance
compute_name = 'my-compute'
#Define the name of your new experiment and environment
experiment_name = 'my-experiment'
environment_name = 'my-environment'
#Define the name of the model and where it is saved
model_name = 'my-model'
model_path = 'outputs/model.pkl'
#Define the directory you want to run the experiment from
source_directory = '.'
#Define the entry script for the experiment
script_path = 'code/train.py'
#Define the location of the machine learning workspace
subscription_id = 'subscription-id'
resource_group  = 'resource-group'
workspace_name  = 'my-workspace'

#Connect to your workspace
ws = Workspace(subscription_id = subscription_id, resource_group = resource_group, workspace_name = workspace_name)

#Use the workspace to create an experiment
exp = Experiment(workspace=ws, name=experiment_name)

#Create an environment with the packages you need
env = Environment(name=environment_name)

for pip_package in ['numpy','pandas']:
    env.python.conda_dependencies.add_pip_package(pip_package)


#Create a run configuration to connect our environment and compute
run_config = RunConfiguration()
run_config.target = compute_name
run_config.environment = env

#Create a script run config to tie all the elements together
config = ScriptRunConfig(
    source_directory=source_directory, script=script_path, run_config=run_config
)

#Submitting the experiment will start it
run = exp.submit(config)

#Wait for the completion and show the output of the experiment as we go
run.wait_for_completion(show_output=True, wait_post_processing=True)

#Register the model with the experiment
run.register_model(model_name=model_name, model_path=model_path)

And there we have it, this will start your experiment on a remote compute, register the model, and be run every-time a change is pushed to the master branch.

Troubleshooting

Why do I not need to install machine learning packages in my build pipeline?

The packages needed are just those needed to call the experiment, not the packages required for the actual model training. This is because the model itself is trained on a remote compute instance within its own environment

Run an experiment in Azure Machine Learning
Register model in Azure Machine Learning
Create a new repository from a template in Azure DevOps
Schedule pipelines and experiments in Azure Machine Learning

References

Azure Pipelines documentation

Azure

Stephen Allwright Twitter

I'm a Data Scientist currently working for Oda, an online grocery retailer, in Oslo, Norway. These posts are my way of sharing some of the tips and tricks I've picked up along the way.