Sometimes we wonder if pre-shipped algorithms are good for our use cases. Do they already consider all the attributes, signals or use cases for forecasting? SAP Integrated Business Planning comes with pre-build algorithms which consider a set of attributes that customers define as relevant for a planning area. However, it is curious to know how a forecast result would change if the algorithm were modified. Sometimes, it is also interesting to explore if we can estimate a forecast adding our own logic in the algorithms.
In the new version (2205) of SAP Integrated Business Planning a set of APIs and a process is now introduced where customers can prepare the data, extract that planning data out of the application and apply an algorithm externally which one can write on his own. This flexibility allows users to validate new forecast scenarios with data or calculations which might be relevant for the customer.
Planning preparation
In this blog, I am using Google’s Vertex AI as an external machine learning platform to do my forecasting. Input data for this purpose is prepared in the SAP Integrated Business Planning for Supply Chain. The focus here is to understand how to integrate the two platforms. To start we need do the following in the SAP IBP application.
- Configure the communication artifacts
- Generate and install a self-signed X.509 certificate or use the one where you can run an external HTTP server.
- Install this certificate in SAP IBP using the Maintain Certificate Trust List application.
- Create a communication user for the Inbound call.
- Create a communication system which would host the HTTP server where your custom forecasting algorithms can be implemented. You need a valid host name or an IP address which can be reached on the internet.
- Create a communication arrangement which specifies the user and the system created in the above steps.
- Create a new Forecast Model in the Manage Forecast Models app in IBP. Select the newly available option — External Algorithm.
Select the option External Algorithim
You can also specify parameters and variables that might be relevant for your external algorithm.
The end-to-end process
SAP IBP comes with an Excel plugin. Here you have a menu item to simulate different forecasting analysis. One of them is using external algorithms. When we choose this menu item to simulate a statistical forecasting using external algorithms — the Excel plugin would send this as a request to the SAP IBP system which is configured in the connection parameters. From the SAP IBP system, a notification is sent to your HTTP server which would host your own code that implements your external algorithm. This server could be a middleware.
Steps at high level
In our current scenario, we would implement a simple HTTP server which would receive this notification from SAP IBP and then use the new OData services to pull the meta data as well as the planning data that was prepared for forecasting. Our HTTP server would then prepare this data and send it to the Google cloud platform where a forecast is calculated using the Google Vertex AI. The results are then sent back the SAP IBP system which can be later seen in your planning view.
HTTP server
As mentioned in the above section, we need a HTTP server to receive notifications from the SAP IBP system to trigger an external forecast computation. This service can be installed on your local machine provided your machine is available on the public internet. Alternatively, you can also use any cloud hosting service for running your server. The IP address or hostname and port number of this server would be configured in the communication system app in SAP IBP. This can also be a middleware application if it can communicate to both the cloud platforms. To keep it simple we considered building a server using Python on hopes that the steps taken here could be easily replicated on any integration middleware. Additionally, you need a X.509 certificate from this server for outbound communication scenarios in SAP IBP. To receive and process a notification from the SAP IBP system we implement a REST API in our HTTP server using Python. The URL of this end point is used by the SAP IBP system to send a notification, when we simulate a statistical forecasting using external algorithm from the excel plugin.
This blog gives an excellent overview of how to use the new APIs in SAP Integrated Business Planning to receive notification and pull data from the system. A sample Python server implementation for this exercise is provided in the Github repo at the end of this Blog.
Integration with Google Vertex AI
Before we integrate with the Google Vertex AI, one needs a valid project in the Google Cloud Platform which is attached to a billing account. This guide can be handy. Make sure you note down the project name which we need later. My project name is called “sap-ibpexternalforecast01”. You must enable the Vertex AI service in your account.
Common methods to integrate with the Google Cloud platform are either,
- Using REST based API from Google.
- gRPC/gax based client/communication.
- SDKs provided by Google.
If you are using a middleware, you can check if option 2 is available, if yes, then either 1 or 2 could be a valid approach. If your middleware does not support gRPC/gax based communication, then you are probably left with option 1. I tried the option 3 which is using SDKs provided by Google for making the basic calls needed for the integration with the platform. We leveraged the Python SDK which was able to transfer the data between SAP and Google cloud platforms. We expect the reader to have a basic understanding of Python language. However, the steps detailed in this section are very generic and not specific for Python.
- Install Python version 3 or above.
- Set up an isolated Python environment. This is highly recommended per-project, its own virtual environment while writing code locally in Python. This can be done using the venv command.
cd to-your-python-project
py -m venv env
.envScriptsactivate
- Install the packages needed for your project. The following python packages are needed: requests, pandas, google-cloud-storage, google-cloud-aiplatform and google-cloud-bigquery. for example:
pip install –upgrade google-cloud-aiplatform
- Now you need a JSON Key file which contains the application credentials from your service account. This key file is used to set up an environment variable called “GOOGLE_APPLICATION_CREDENTIALS”. Once you have downloaded the JSON Key file, then create an environment variable with the given name and point the path to the key file as its value. You can follow the detailed steps defined here.
The data received from the SAP IBP system needs to be sent to Google Vertex AI. The steps implemented in the HTTP Server are,
- Create a dataset in BigQuery and a table to store training data.
- A separate table for scoring is also needed in the dataset.
- Upload the data to the new tables.
- We need a dataset in the Google Vertex AI to create a Model
- Train the model using the data from the BigQuery table.
- Start a batch prediction job on the model using scoring data from BigQuery.
- Once results are available, download the data, format and send it to SAP IBP.
BigQuery Tables
To create a dataset for training a model or creating a batch prediction job using Google Vertex AI, one must send data to the Google Cloud Platform. This can be done via a CSV file upload or via a BigQuery table. Using REST, gRPC or SDK, it is possible to automate this step as an alternative to manually creating it from the User Interface. The following Python snippet was used to create the dataset. I passed a parameter called the dataset_id which had a value “ibp_eft”.
def create_bq_dataset(dataset_id) -> None:
from google.cloud import bigquery
bq_client = bigquery.Client()
ds = bq_client.create_dataset(dataset_id)
print(
"Created dataset {}.{}".format(ds.project, ds.dataset_id)
)
Once you have a dataset, then we create a table in that dataset to upload data to the Google Cloud Platform. You need a fair understanding of the data which you want to send as it reflects on the table schema. I am sending product, location and customer IDs as strings. I have a time stamp column which directly matches to the planning level defined in my planning view in SAP IBP. Its data type is timestamp. Additionally, a column of type integer is needed which would hold the values for the consensus demand quantity which is taken as an example for my forecast. This column is where the historic values are stored for the respective product. It is also the column where the Google Vertex AI would calculate the forecast predictions and write as results later. The following Python snippet was used to create the BigQuery table. I passed a parameter called the dataset_id which had a value “ibp_eft”. My table name was “demand_qty_py01”.
def create_table(table_name, dataset_id) -> None:
from google.cloud import bigquery
schema = [
bigquery.SchemaField("prdid", "STRING", mode="REQUIRED"),
bigquery.SchemaField("locid", "STRING"),
bigquery.SchemaField("custid", "STRING"),
bigquery.SchemaField("timestamp", "TIMESTAMP"),
bigquery.SchemaField("cdemandQty", "INTEGER"),
]
table_id = str(project + "." + dataset_id + "." + table_name)
table = bigquery.Table(table_id, schema=schema)
bq_client = bigquery.Client()
table = bq_client.create_table(table)
print(
"Created table {}.{}.{}".format(table.project, table.dataset_id, table.table_id)
)
The above snippet is self explanatory for an experienced Python programmer. We are importing the BigQuery package, creating a BigQuery client and then using the SDK based API — create_table to create a BigQuery table in that dataset. If you carefully note the variable — table_id you can see that it is a combination of the project name which you created in the beginning, the dataset name and then the table name. After you have run the snippet, you would have a success if the BigQuery service was able to parse your payload. This does not mean that the table was created. It is an asynchronous process, the BigQuery service takes a few more seconds after your call returns with a success to literally create the table in your project space. This means, if you try to send the data on the return of the above call you operation would probably fail. It is better to wait a few more seconds and then use the table name to upload the data. The following Python snippet was used to send data to the BigQuery table. I passed a parameter called the dataset_id which had a value “ibp_eft”. My table name was “demand_qty_py01”. The ibp_data variable contained the JSON format of the rows which I wanted to upload to the table.
def insert_bq_table(table_name, dataset_id, ibp_data) -> None:
from google.cloud import bigquery
table_id = str(project + "." + dataset_id + "." + table_name)
bq_client = bigquery.Client()
errors = bq_client.insert_rows_json(table_id, ibp_data)
if errors == []:
print("IBP data is inserted.")
else:
print("Errors while inserting rows: {}".format(errors))
Once the data is stored in the BigQuery table, you can start with the next step of creating a Vertex AI Model which can be used for the actual forecast prediction.
Vertex AI Models and training
Like any other AI scenario there are two stages in the Google Vertex AI service — a training and a scoring stage. It is for the training stage where you upload the raw data to train a Model which is used for the later stage — scoring which in our case is forecasting. The Vertex AI Model is created using the data which we have uploaded in our previous step in the BigQuery table. To create a model, first we need to create a dataset in the Vertex AI. The following snippet can be used to create a dataset.
async def create_vi_dataset(dataset_display_name, bigquery_source_uri):
vi_ds_client = aiplatform_v1.DatasetServiceAsyncClient(client_options=client_options)
metadata_dict = {"input_config": {"bigquery_source": {"uri": bigquery_source_uri}}}
metadata = json_format.ParseDict(metadata_dict, Value())
dataset = aiplatform_v1.Dataset()
dataset.display_name = dataset_display_name
dataset.metadata_schema_uri = "gs://google-cloud-aiplatform/schema/dataset/metadata/time_series_1.0.0.yaml"
dataset.metadata = metadata
request = aiplatform_v1.CreateDatasetRequest(
parent=parent,
dataset=dataset,
)
operation = await vi_ds_client.create_dataset(request=request)
print("Waiting for dataset to be created to complete...")
response = await operation.result()
print(response)
n this snippet you can see that we are using the Python SDK for Google Vertex AI. The package aiplatform_v1 contains different types of clients. We are using the Dataset service client to create a dataset in Vertex AI. It needs the URI of the BigQuery table. The URL is of the format:
bq://” + project + “.” + dataset_id + “.” + inputTableName
Here, project is the name of your project, the dataset id is the one from your BigQuery table which you have created before and the then finally the table name.
The dataset definition is a JSON structure. It needs a set of key value pairs based on the type definitions of dataset. The metadata for the dataset needs to be created based on a schema. There are multiple schemas supported by Google Vertex AI. The one we are using here is defined for time series data. You also need a dataset name. Finally we create the dataset using the CreateDatasetRequest API from the Python SDK.
Once you have a dataset, then it is possible to create a Vertex AI model. The Model creation on the Vertex AI can be done from the user interface. However, if you like to do it via code one needs to prepare a training pipeline JSON structure. This structure contains a transformations array which defines the columns that needs to be considered for the data transformation. It also contains the data granularity definition which corresponds to the planning level defined in your planning view. In our case it was on the day level. It is important to mention the time column which represents the time stamp. In the BigQuery table definition we defined a column called timestamp which is used as the time column. We need a mandatory identifier column which is the actual key figure from the SAP IBP system. In our example the product id column defined as prdid is considered as this time series identifier column. This column should be represented with its time stamp and associated with a target column which in our example is the cdemandQty. Here is an example of the table rows.
Sample table with product, location, customer and demand with planning level
In this above table you can see two different products A & B which are having a demand of 10 and 11 on the day 1st of February. If we consider the above JSON structure then prdid would then be the time series identifier column, the timestamp would then be the time column and the cdemandQty would then be the target column for the model in Vertex AI. Below is a Python function which build this training pipeline structure, creates a pipeline service client from the Google Vertex AI SDK and then creates t training pipeline using the API.
async def create_vi_model(pipeline_display_name, model_display_name, dataset_id, p, l):
transformations = [
{"timestamp": {"column_name": "timestamp", "invalidValuesAllowed": "false"}},
{"auto": {"column_name": "cdemandQty"}},
]
data_granularity = {"unit": "day", "quantity": 1}
training_task_inputs_dict = {
"targetColumn": "cdemandQty",
"timeSeriesIdentifierColumn": "prdid",
"timeColumn": "timestamp",
"transformations": transformations,
"dataGranularity": data_granularity,
"optimizationObjective": "minimize-rmse",
"trainBudgetMilliNodeHours": 8000,
"unavailableAtForecastColumns": ['cdemandQty', 'locid', 'custid'],
"availableAtForecastColumns": ['timestamp'],
"forecastHorizon": 10,
}
training_task_inputs = json_format.ParseDict(training_task_inputs_dict, Value())
vi_pipe_client = aiplatform_v1.PipelineServiceClient(client_options=client_options)
training_pipeline = {
"display_name": pipeline_display_name,
"training_task_definition": "gs://google-cloud-aiplatform/schema/trainingjob/definition/automl_forecasting_1.0.0.yaml",
"training_task_inputs": training_task_inputs,
"input_data_config": {
"dataset_id": dataset_id,
"fraction_split": {
"training_fraction": 0.8,
"validation_fraction": 0.1,
"test_fraction": 0.1,
},
},
"model_to_upload": {"display_name": model_display_name},
}
parent = f"projects/{p}/locations/{l}"
print("Waiting model creation...")
response = vi_pipe_client.create_training_pipeline(
parent=parent, training_pipeline=training_pipeline
)
print("response:", response)
You can notice two interesting parameters which are passed to this function — p & l. they both are used to create a variable called parent. P represents the project name and l represents the location where the model needs to be created. As of writing this blog, a Model can only be created in the same location where the dataset for training in BigQuery is available. You might also want to cross check if the APIs are supported on the location where you project is created. The training pipeline JSON structure needs to represent a yaml definition which defines the schema for different AI models. In our case, we want to do a forecasting and hence it points to the automl_forecasting_1.0.0.yaml file provided by Vertex AI. If you have a large number of rows for training, it will take a longer time for creating the model.
In the above sample data, we had only one time series identifier or key figure called prdid (Product ID). We did have associations for the product, like location and customer IDs which we did not use. The vertex AI would then create only one main model if there are no associations. If there were associations, then Vertex AI would create additional sub models for each association, in fact this would take a longer time to create the model and of course increase the robustness of the model.
Batch prediction on Vertex AI
After the model is created in Vertex AI, usually one user gets an email notification about the completion. In case of failure or success on model creation, the email is usually sent to the owner of the project.
So far, we have, created a dataset in Vertex AI, trained a model using data from SAP IBP. To do forecast, we need to create a table in Big query which could hold the raw data that is needed for scoring or actual prediction. It must have the same schema used for the training data table in Big Query. Additionally, the table should have rows which can contain future dates when forecast prediction is needed. These rows must not contain any value in the target column. In Python you can use NaN (Not a number) or simply do not have a key value pair for this JSON structure when you have future dates where forecast prediction is needed. This data must be uploaded to a new Big Query table and you need the URI of this table as well as the URI of a Big Query folder where you want to store the forecast prediction results. Batch prediction can be initiated via a call to the Google Vertex AI with a set of parameters. The below code snippet was used to trigger the job.
async def create_batch_prediction(batch_prediction_display_name, model_name, predictions_format, source_folder, target_folder, p, l):
batch_prediction_job = {
"display_name": batch_prediction_display_name,
"model": model_name,
"input_config": {
"instances_format": predictions_format,
"bigquery_source": {"input_uri": source_folder},
},
"output_config": {
"predictions_format": predictions_format,
"bigquery_destination": {"output_uri": target_folder},
},
}
batch_prediction_inputs = json_format.ParseDict(batch_prediction_job, Value())
vi_job_client = aiplatform_v1beta1.JobServiceClient(client_options=client_options)
parent = f"projects/{p}/locations/{l}"
print("Waiting model creation...")
response = vi_job_client.create_batch_prediction_job(
parent=parent, batch_prediction_job=batch_prediction_job
)
print("response:", response)
We have explicitly used the prediction format as “bigquery”. Alternatively, one can also use CSV as a format and store the results on a storage folder. However, in both the cases, the results can be downloaded from the Google console or programmatically via SDK or APIs.
Predictions results to Planning View
The forecast prediction on Google Vertex AI currently is a batch job. It is not an online prediction as of writing this Blog. Hence, we need to create this batch job and patiently wait for the results. Once the computation is done, we get an email notification. The email would contain details where the results are stored based on the input parameters of the Batch prediction job. In this example, we stored the results on the Big Query table, from there we can download the data and reformat it to the API payload needed for SAP IBP.
if len(results.keys()):
for key_figure_name, key_figure_result in results.items():
key_figure_data = {
"RequestID": request_id,
"GroupID": planning_object["GroupID"],
"SemanticKeyFigure": key_figure_name,
"ResultData": key_figure_result,
}
output["_AlgorithmDataOutput"].append(key_figure_data)
message = {
"RequestID": request_id,
"GroupID": planning_object["GroupID"],
"MessageSequence": 1,
"MessageType": "I",
"MessageText": "Okay",
}
output["_Message"].append(message)
if len(results.keys()):
msg_header = {
"RequestID": request_id,
"GroupID": -1,
"MessageSequence": 1,
"MessageType": "I",
"MessageText": f"{algorithm_name}",
}
output["_Message"].append(msg_header)
else:
msg_header = {
"RequestID": request_id,
"GroupID": -1,
"MessageSequence": 1,
"MessageType": "E",
"MessageText": f"Forecast calculation failed! Algorithm: {algorithm_name} .",
}
output["_Message"].append(msg_header)
output_json = json.dumps(output)
token_request = requests.get(RESULT_URL, headers={"x-csrf-token": "fetch", "accept": "application/json"},
cookies=cookies, verify=False)
if token_request.status_code == 200:
result_send_post = requests.post(RESULT_URL, str(output_json), cookies=cookies,
headers={"x-csrf-token": token_request.headers["x-csrf-token"],
"Content-Type": "application/json", "OData-Version": "4.0"}, verify=False)
print(
f"Forecast result for id {request_id} sent back to IBP system! Status code: {result_send_post.status_code}.", flush=True)
Computed data from the batch prediction job would be available on the cdemandQty column. This is then passed as a value to the key “ResultData” for a specific product which is under the Group ID of your planning object.
The complete source code can be found in this repository for your reference.
I find customers could benefit from Google Vertex AI when you can bring your own algorithms in most popular AI frameworks on a Compute Engine instance inside the Vertex AI workbench. When you have such a deep learning virtual machine image, with Google Vertex AI you don’t worry about the infrastructure components. Combined with carefully prepared data from SAP IBP, one can seamlessly reap the benefits of both the platforms.
Conclusion
SAP IBP offers a variety of algorithms that are used very successfully by a large number of productive customers. Therefore, this activity was not about investigating which forecast algorithms are better or worse, but to illustrate additional options available to SAP IBP customers by leveraging Google Vertex AI platform. From my experience working on the SAP IBP and the Google Vertex AI platform, I see both the platforms bring tremendous flexibility for customers to fine tune Supply Chain Planing in a combined tool set . One side you have the possibility to represent business data based on any kind of process in SAP IBP, on the other side, you now have the possibility to define your own constrains or capabilities in an algorithm, bring that to a machine learning platform, apply your existing business data and visualize after the computation. By complimenting each other, both the platforms are improving operational visibility and driving innovation in Supply Chain planning.
Special thanks to friends at Google and SAP who supported me during this work. Thanks for your reading, I would be happy to hear from you.
Domnic Savio Benedict
Note: This blog was originally published in Medium