Tracking metrics is a crucial part in the development phase of new machine learning models. In the last few years, many tools were introduced dedicated especially to this purpose, for example, Weights & BiasesMLflow and more.  

When data scientists are developing a new model in their local environment, they use a tracking tool of their choice. Once development is finished, the same code arrives in production and is used in (re-)training workflows running on SAP AI Core. To track metrics in production, you can use the tracking functionality of SAP AI Core and conveniently access the metrics via SAP AI Launchpad. Now here is the problem: Moving from development to production requires you to rewrite the code to track metrics from using your tracking tool of choice in your local environment to using the SAP AI Core SDK in the productive environment and vice versa. 

This blog post showcases how you can track metrics in your local and productive environment without requiring code changes using the popular library MLflow and the tracking functionality of SAP AI Core. In addition, we are using the Metaflow library for SAP AI Core to generate our workflow templates. If you are not familiar with this library, check out our recent blog post. 

All the code shown in this blog post can be found here so that you can easily follow along. 

 

Tracking with MLflow 

In all stages during the development phase of a machine learning model, metrics tracking plays a crucial role. One popular library used by Data Scientists is MLflow and its tracking capabilities. It enables users to log parameters and metrics over different runs and can visualize them in a graphical user interface. 

Additionally, MLflow comes with an auto-logging functionality, which tracks all relevant metrics of models created by libraries like Tensorflow, PyTorch oder sklearn automatically. This is super convenient as there is no need for a data scientist to define explicitly which metrics to track. 

 

Creating an abstraction for the tracking 

The following demonstrates how to create an abstraction so that the metrics tracking can be used locally and during production on SAP AI Core without changes. 

We realize this abstraction by creating a Python class called TrackingContext. The class extends the functionalities of the mlflow.start_run() context manager. If the workflow is executed locally, it behaves just like MLflows built-in tracking. If workflow is executed in production on SAP AI Core, it connects with the tracking API of SAP AI Core and will log all tracked metrics which users can visualize within SAP AI Launchpad or query via the API of SAP AI Core. 

 

The following code block shows a simple demonstration of this functionality. The goal is to train a simple linear regression model on the California Housing dataset and track the relevant metrics. 
To execute the script, comment out the @kubernetes and @argo decorators and run the script with the following command: python trainflow.py run 

The script loads the dataset, splits it into a train and test dataset, and executes the model training. The highlighted parts of the code show that the auto-logging functionality for the library sklearn is activated and how the class TrackingContext can be used as a context manager. 

 

from metaflow import FlowSpec, step, kubernetes, argo, Parameter 
from AIC_AutoLogging import TrackingContext 
import mlflow 
import pandas as pd 
import numpy as np 
import os 
from sklearn.linear_model import Ridge 
from sklearn.model_selection import train_test_split 
 
class TrainFlow(FlowSpec): 
    alpha = Parameter('alpha',help='Alpha',default=0.01) 
 
    @kubernetes(image="docker.io/<repository>/image:0.0.1", 
secrets=['default-object-store-secret']) 
    @argo(input_artifacts=[{"name": "housing","path": "/data/housing.csv"}]) 
    @step 
    def start(self): 
        print(self.alpha) 
        mlflow.sklearn.autolog() 
        housing = pd.read_csv('/data/housing.csv') 
 
X_train, X_test, y_train, y_test = train_test_split(housing.loc[:, housing.columns !=  
'Target'], housing['Target']) 
        linreg = Ridge(alpha=self.alpha) 
        with TrackingContext() as run: 
            linreg.fit(X_train, y_train) 
 
        self.next(self.end) 
 
    @kubernetes(image="docker.io/<repository>/image:0.0.1", ", 
secrets=['default-object-store-secret']) 
    @step 
    def end(self): 
        print('End of flow') 

 

Once the training process has finishedyou can visualize and compare the results of different training runs using MLflow’s user interface. As we didn’t specify which metrics to track but rather leveraged the auto-logging capability, all relevant metrics for this kind of model were tracked.

MLflow%20tracking%20overview

MLflow tracking overview

Running on SAP AI Core 

 But what happens when we execute this workflow on SAP AI Core? Uncomment the @kubernetes and @argo decorators and generate the workflow template with the following command using the Metaflow library for SAP AI Core: 

 

 
python trainflow.py --with=kubernetes:secrets=default-object-store-secret argo create --image-pull-secret=<AI Core docker secret> --label={"scenarios.ai.sap.com/id":"<Ml-Scenario>","ai.sap.com/version":"1.0.0"} --annotation={"scenarios.ai.sap.com/name":"<ML-Scenario-Name>","executables.ai.sap.com/name":"trainflow"} --only-json > trainflow.json 

 

From here, store your new workflow template in your Git repository and execute the training on SAP AI Core as usual. If you are new to SAP AI Core, check out this blog post or our tutorials. 

 

Once the training has finished, head to SAP AI Launchpad to inspect your metrics. As intended, the same metrics were tracked and are now visible in SAP AI Launchpad as they were previously in MLflow. In SAP AI Launchpad, you can now compare your metrics across different runs in tables or even using charts.
Without explicitly defining which metrics to track, we were successfully able to track and compare our metrics across different training runs on our local machine as well as on SAP AI Core. 

 

Limitations 

The current implementation of the TrackingContext class will fetch all saved metrics from MLflow and pass them to the SAP AI Core SDK. This happens after the training process when the exit method of the TrackingContext class is triggered. Therefore, a live update of the metrics is currently not supported.  

AI%20Launchpad%20comparison%20view 

AI Launchpad comparison view

Wrap-up 

Data scientists face the issue that their local environment is different from their productive environment on SAP AI Core. Consequently, changing the environment requires changes in the code.
In this blog post, we demonstrated how data scientists can overcome this issue and use the same code to track their metrics in both their local environment using MLflow and their productive environment on SAP AI Core. 

The abstraction we implemented will check the environment on which it is executed – whether it’s locally or on SAP AI Core. If it is executed on SAP AI Core, we persist the captured metrics via the SAP AI Core SDK. Otherwise, we default to MLflow’s behavior.
We hope that this can make the life of data scientists easier and simplify the transition between development and production environments. 

 

 

For more information on SAP AI Core & SAP AI Launchpad: 

Sara Sampaio

Sara Sampaio

Author Since: March 10, 2022

0 0 votes
Article Rating
Subscribe
Notify of
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x