Tracking metrics is a crucial part in the development phase of new machine learning models. In the last few years, many tools were introduced dedicated especially to this purpose, for example, Weights & Biases, MLflow and more.
When data scientists are developing a new model in their local environment, they use a tracking tool of their choice. Once development is finished, the same code arrives in production and is used in (re-)training workflows running on SAP AI Core. To track metrics in production, you can use the tracking functionality of SAP AI Core and conveniently access the metrics via SAP AI Launchpad. Now here is the problem: Moving from development to production requires you to rewrite the code to track metrics from using your tracking tool of choice in your local environment to using the SAP AI Core SDK in the productive environment and vice versa.
This blog post showcases how you can track metrics in your local and productive environment without requiring code changes using the popular library MLflow and the tracking functionality of SAP AI Core. In addition, we are using the Metaflow library for SAP AI Core to generate our workflow templates. If you are not familiar with this library, check out our recent blog post.
All the code shown in this blog post can be found here so that you can easily follow along.
Tracking with MLflow
In all stages during the development phase of a machine learning model, metrics tracking plays a crucial role. One popular library used by Data Scientists is MLflow and its tracking capabilities. It enables users to log parameters and metrics over different runs and can visualize them in a graphical user interface.
Additionally, MLflow comes with an auto-logging functionality, which tracks all relevant metrics of models created by libraries like Tensorflow, PyTorch oder sklearn automatically. This is super convenient as there is no need for a data scientist to define explicitly which metrics to track.
Creating an abstraction for the tracking
The following demonstrates how to create an abstraction so that the metrics tracking can be used locally and during production on SAP AI Core without changes.
We realize this abstraction by creating a Python class called TrackingContext. The class extends the functionalities of the mlflow.start_run() context manager. If the workflow is executed locally, it behaves just like MLflows built-in tracking. If workflow is executed in production on SAP AI Core, it connects with the tracking API of SAP AI Core and will log all tracked metrics which users can visualize within SAP AI Launchpad or query via the API of SAP AI Core.
The following code block shows a simple demonstration of this functionality. The goal is to train a simple linear regression model on the California Housing dataset and track the relevant metrics.
To execute the script, comment out the @kubernetes and @argo decorators and run the script with the following command: python trainflow.py run
The script loads the dataset, splits it into a train and test dataset, and executes the model training. The highlighted parts of the code show that the auto-logging functionality for the library sklearn is activated and how the class TrackingContext can be used as a context manager.
from metaflow import FlowSpec, step, kubernetes, argo, Parameter
from AIC_AutoLogging import TrackingContext
import mlflow
import pandas as pd
import numpy as np
import os
from sklearn.linear_model import Ridge
from sklearn.model_selection import train_test_split
class TrainFlow(FlowSpec):
alpha = Parameter('alpha',help='Alpha',default=0.01)
@kubernetes(image="docker.io/<repository>/image:0.0.1",
secrets=['default-object-store-secret'])
@argo(input_artifacts=[{"name": "housing","path": "/data/housing.csv"}])
@step
def start(self):
print(self.alpha)
mlflow.sklearn.autolog()
housing = pd.read_csv('/data/housing.csv')
X_train, X_test, y_train, y_test = train_test_split(housing.loc[:, housing.columns !=
'Target'], housing['Target'])
linreg = Ridge(alpha=self.alpha)
with TrackingContext() as run:
linreg.fit(X_train, y_train)
self.next(self.end)
@kubernetes(image="docker.io/<repository>/image:0.0.1", ",
secrets=['default-object-store-secret'])
@step
def end(self):
print('End of flow')
Once the training process has finished, you can visualize and compare the results of different training runs using MLflow’s user interface. As we didn’t specify which metrics to track but rather leveraged the auto-logging capability, all relevant metrics for this kind of model were tracked.
Running on SAP AI Core
But what happens when we execute this workflow on SAP AI Core? Uncomment the @kubernetes and @argo decorators and generate the workflow template with the following command using the Metaflow library for SAP AI Core:
python trainflow.py --with=kubernetes:secrets=default-object-store-secret argo create --image-pull-secret=<AI Core docker secret> --label={"scenarios.ai.sap.com/id":"<Ml-Scenario>","ai.sap.com/version":"1.0.0"} --annotation={"scenarios.ai.sap.com/name":"<ML-Scenario-Name>","executables.ai.sap.com/name":"trainflow"} --only-json > trainflow.json
From here, store your new workflow template in your Git repository and execute the training on SAP AI Core as usual. If you are new to SAP AI Core, check out this blog post or our tutorials.
Once the training has finished, head to SAP AI Launchpad to inspect your metrics. As intended, the same metrics were tracked and are now visible in SAP AI Launchpad as they were previously in MLflow. In SAP AI Launchpad, you can now compare your metrics across different runs in tables or even using charts.
Without explicitly defining which metrics to track, we were successfully able to track and compare our metrics across different training runs on our local machine as well as on SAP AI Core.
Limitations
The current implementation of the TrackingContext class will fetch all saved metrics from MLflow and pass them to the SAP AI Core SDK. This happens after the training process when the exit method of the TrackingContext class is triggered. Therefore, a live update of the metrics is currently not supported.
Wrap-up
Data scientists face the issue that their local environment is different from their productive environment on SAP AI Core. Consequently, changing the environment requires changes in the code.
In this blog post, we demonstrated how data scientists can overcome this issue and use the same code to track their metrics in both their local environment using MLflow and their productive environment on SAP AI Core.
The abstraction we implemented will check the environment on which it is executed – whether it’s locally or on SAP AI Core. If it is executed on SAP AI Core, we persist the captured metrics via the SAP AI Core SDK. Otherwise, we default to MLflow’s behavior.
We hope that this can make the life of data scientists easier and simplify the transition between development and production environments.
For more information on SAP AI Core & SAP AI Launchpad:
- Follow us in the SAP Community: SAP AI Core | SAP AI Launchpad
- Start innovating with our tutorials: SAP AI Core | SAP AI Launchpad
- Find guidance in the SAP Help Portal: SAP AI Core | SAP AI Launchpad