Machine Learning capabilities have been part of SAP HANA since the earliest version and have continuously evolved over time. These days, Machine Learning embedded in SAP HANA comes in two flavors: the Automated Predictive Library (APL) and the Predictive Analysis Library (PAL). While the automated Machine Learning of the APL targets especially developers and business analysts, the expert Machine Learning of the PAL is designed for data scientists. With the features, the contents have evolved providing hands-on tutorials, background information and updates on recent enhancements.

 

This blog post aims at providing a comprehensive overview on the most recent content. It’s meant to be a living document, so we’ll try to keep it as up to date as possible.

The links are structured as follows:

  • Getting started – Your first steps with SAP HANA Machine Learning
  • Machine Learning Operationalization – How to bring Machine Learning to life in real-world scenarios
  • Deep-Dive sections into:
    • Classification
    • Outlier and Anomaly Detection
    • Time Series Analysis
    • Text Mining
  • External Machine Learning – How to integrate with your external R or TensorFlow servers
  • Miscellaneous – All other relevant materials.

Getting started

Denys van Kempen provides a broad overview on materials to get started with SAP HANA Machine Learning in this blog. You will find links to the documentation, recent demo videos as well as code samples in this collection:

With our simple SQL interface, developers and data scientists can easily work with all the features of the SAP HANA database and integrate them with any other SQL-based solution. While SQL is considered the third most popular language for Machine Learning, we are nevertheless aware of the fact that there are other scripting languages even more popular with data scientists being specifically Python and R.

To get started with the Python and R APIs, you might want to take look at our initial release blog post, from Arun Godwin Patel:

Arun also wrote two great articles on the power of the SAP HANA dataframe object that is introduced with the APIs. It builds the foundation of all the Machine Learning features in the library, so understanding it is important for all the following articles:

Last but not least, he introduces us to the Explanatory Data Analysis feature. This toolset specifically allows us to visually explore data using different graphics and charts, while benefiting from the remote execution of aggregation statements.

Are you looking for the ultimate step-by-step guide on how to get started with Python and Machine Learning in SAP HANA Cloud? Well, Andreas Forster has you covered with this article:

Too much Python around here so far from your perspective? Perfect timing to spotlight our R API!

Yannick Shaper prepared two articles on the basics of working with R and SAP HANA, and the integration of R-based SAP HANA Machine Learning within a Shiny App:

After taking the first steps into Machine Learning models, you might want to bring them to life in a production scenario. But before doing that, make sure the model meets your quality criteria!

Raymond Yao has prepared a great example of how to use the new Model Report for that. It assists in understanding and debriefing a trained model by displaying model statistics, variable importance and standard metrics.

 

Machine Learning Operationalization

Maintenance of the Machine Learning model lifecycle and especially versioning of different states of a model is an important part of making Machine Learning enterprise ready. This article from Xin Chen explains the details of how to set up a model storage in SAP HANA using the Python client for Machine Learning:

For managing and orchestration of large Data Science and Machine Learning architectures, SAP Data Intelligence comes into play. In this article Andreas Forster explains how to leverage the Python client for SAP HANA Machine Learning with the Jupyter Notebook operator in Data Intelligence:

To make Machine Learning part of a complex data processing workflow, you can include SAP HANA Machine Learning in your SAP Data Intelligence pipelines. The following article from Andreas gives you all the details to get started:

Your data resides in SAP Data Warehouse Cloud together with your business reports, but you don’t want to miss the power of SAP HANA Machine Learning? We’ve got you covered! Learn how to leverage SAP HANA Machine Learning from DWC:

Your scenario requires SAP HANA Machine Learning on data stored in SAP HANA, but the predictions need to be executed in an independent environment? Deploying the JavaScript export of your APL model might be an option. Learn from Andreas how to do this in this article:

One of the most common scenarios for SAP HANA Machine Learning is the implementation in the context of ABAP-based SAP applications, like SAP S/4HANA or SAP BW/4HANA.

Jerome Zhao showcases, how to call SAP HANA Machine Learning functions from ABAP in this article:

To provide a more sophisticated integration, especially with SAP S/4HANA, the Intelligent Scenario Lifecycle Management (ISLM) was introduced, to orchestrate all Machine Learning activities like the creation of scenarios and models as well as training, deployment and activation of those Machine Learning models.

Venkata Raghu Banda has prepared a comprehensive collection of materials on ISLM in this blog post:

Some of the highlights of his collection are:

You may also check our overview pages on Intelligent ERP and ISLM to get the latest updates:

 

Deep-dive sections

Let’s take a closer look into some of the most important scenarios for SAP HANA Machine Learning.

Regression

Andreas Forster created a nice demo on the use of regression techniques to predict used car prices, using the Python API.

Classification

Kurt Holst contributed a series of three blog posts focused on a classification scenario. He demonstrates the end-to-end implementation making use of the R API, as well as how to evaluate the business value of a model created that way:

Outlier and Anomaly Detection

Likun Hou has prepared four blog entries on different techniques for outlier and anomaly detection.

The first article demonstrates the usage of Statistical Tests (Variance Test and IQR Test) for Outlier Detection. Likun shows that IQR test is a more robust outlier detection method with the presence of extremely deviated (from mean/median) values in the targeted numerical feature. However, both methods only work on 1-dimensional numerical data, so they are mainly applicable to outliers with at least one outstanding numerical features value.

The second option is to use the DBSCAN clustering algorithm to perform Outlier Detection. Different from the Statistical test approach above, all feature values can get involved if appropriate distance metrics are adopted.

Typically, these two methods (Statistical Test and Clustering) can only detect outliers in the input dataset, and the detection result cannot be generalized to new data points, because they do not come up with any model. The third article demonstrates how Classification methods can be adopted to overcome this difficulty.

However, all the aforementioned techniques become less applicable, when the dataset of interest is of high dimensionality (i.e. contains many features), or the boundary between normal points and anomalous ones is complicated. In his fourth article, Likun demonstrates a better approach by manually labeling the point of anomalies in the dataset, and then training a supervised Machine Learning model for the classification of normal points and anomalies.

Another approach to Anomaly detection is based on sensor data over time, that requires the usage of time series analysis techniques. We have the basics of that covered in the section below. Nidhi Sawhney and Rafael Pacheco showcase two scenarios in these three articles:

Finally, Raymond Yao shows us how to apply Weibull Analysis – one of the most used algorithms for Predictive Maintenance use cases.

Time-Series Analysis

Another series of great articles from Likun Hou covers the most relevant aspects of Time-Series Analysis.

He starts off with explaining the basic principles of Time-Series Analysis, specifically the ideas of “Trends” and “Seasonality” and how to perform decomposition on these to prepare for an Anomaly Detection.

The second article explains how to apply the most commonly used techniques for Time-Series Analysis: Exponential Smoothing and ARIMA.

Lastly, Likun introduces one of the most recent enhancement of the Predictive Analysis Library (PAL): The Additive Model Time-series Analysis, that is an advanced approach that proves to be superior in dealing time-series with complicated trend, multiple seasonality as well as cyclic patterns.

Xin Chen dedicated another article to Seasonal Decomposition, showcasing examples of how this can be done with SAP HANA PAL.

While many Time-Series scenarios are based on just one time-dependent variable, there are also many cases where Time-Series consist of more than one time-dependent variable and each variable depends not only on its past values but also has some dependency on other variables. These Multivariate Time-Series are covered in this article:

Text Mining

The most recent enhancement to the SAP HANA Machine Learning features is the Text Mining feature. The initial version allows for analysis and classification of texts, like service tickets or text messages and enable users to explore relations among the texts. Learn how to make use of this feature in Alex Dalentzas blog post:

 

External Machine Learning

The third flavor of the SAP HANA Machine Learning is the integration of external Machine Learning servers. It mainly allows us to remotely execute Machine Learning models in TensorFlow or R on separate servers using data from a SAP HANA database (on-premises) and consuming back the results in SAP HANA as well.

These two articles provide an overview of the R server integration.

More information on the TensorFlow integration can be found in these posts from Philip Mugglestone and Nandi Kishore:

 

Miscellaneous

Sometimes, you have the right data, in the right place, at the right time, but your scenario requires them to be turned by 90 degrees. That can be cumbersome, but Nidhi Sawhney shows us, how pivoting can be easily done, using SAP HANA and the Python API.

This article describes, how to import multiple excel files into a single SAP HANA table using the Python Machine Learning client for SAP HANA.

If you are looking for more code examples on the use of SAP HANA Machine Learning, please take a look at our sample repository on Github, to find dozens of examples for the various use cases of Machine Learning.

 

Thank you!

I would like to thank all the above contributors for their tremendous effort and time to create these valuable materials!

As said in the introduction, this article will receive updates any time new relevant content gets created. If there is anything you miss in this collection (either because we missed it or because there is no resources on your specific topic), do not hesitate to reach to Christoph Morgen (SAP HANA Product Management) or Matthias Menz (SAP HANA Solution Management).

Also, we are happy to take your feedback, thoughts or questions in the comment section below!

Randa Khaled

Randa Khaled

Author Since: November 19, 2020

0 0 votes
Article Rating
Subscribe
Notify of
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x