For many companies, data strategy may involve storing business data in independent silos at different repositories.  Some of that data may even cross different cloud sources (for cost and other reasons) which brings along new challenges with data fragmentation, data duplication and loss of data context.  SAP Datasphere helps bridge siloed and cross cloud SAP and non-SAP data sources enabling businesses to get richer business insights, all while keeping the data at its original location and eliminating the need to duplicate data and time consuming ETLs.  

Databricks Lakehouse is a popular cloud data platform that is used for housing business, operational, and historical data in its delta lakes and data lake houses. 

In this blog, let’s see how to do unified analytics on SAP Analytics Cloud by creating unified business models that combine federated non-SAP data from Databricks with SAP business data to derive real-time business insights.  


The integration of Databricks and SAP BTP can be summarized in five simple steps: 

Step1: Identify the source delta lake data in Databricks: 

Step2: Prepare to connect Databricks to SAP Datasphere. 

Step3: Connect Databricks as a source in SAP Datasphere connections. 

Step4: Create Analytical dataset in SAP Datasphere to join live SAP and non-SAP(Databricks) data into one unified semantic model. 

STEP 5: Connect to this Analytical unified data model live from SAP Analytics Cloud and create visualizations that help illustrate quick business insights.  

 

The Details 

STEP1: Identify the source delta lake data in Databricks.

  1. For this blog, we will federate IoT data from Databricks delta lake and combine it with product master data from SAP sources.  

IoT%20Data%20in%20Databricks

IoT Data in Databricks

Customer%20Master%20Data

Customer Master Data

 

 STEP 2: Prepare to connect Databricks to SAP Datasphere. 

 

  1. Go to your Databricks SQL Warehouse, Connection details tab as shown below and copy the jdbc url. 

JDBC Connectivity info from Databricks

 

2. Go to User settings–>Generate New Token, Copy & note the token. 

 

3. Rewrite the above JDBC string that we got in Step1, removing the uid and PWD parameters and adding the 2 new as shown below (IgnoreTransactions and UseNativeQuery) 

jdbc:databricks://adb-<id>.19.azuredatabricks.net:443/default;transportMode=http;ssl=1;AuthMech=3;httpPath=/sql/1.0/warehouses/<id>;IgnoreTransactions=1;UseNativeQuery=0 

 

STEP 3 : Connect Databricks as a source in SAP Datasphere: 

Pre-Requisites: Data Provisioning Agent is installed and connected to SAP Datasphere. Make sure the DP Agent system can talk to the Databricks cluster. 

  1. Download the latest Databricks jdbc driver copied to camel/lib directory .  
  2. Restart the DP agent.  
  3. Make sure CamelJDBCAdapter is registered and turned on in SAP Datasphere by following this help. 
  4. In DWC Connections create a Generic JDBC connection and enter the details as shown below filling in the jdbc url we formed earlier.

Username : token  

Password:  <use the token we copied earlier from databricks user settings> 

SAP%20Datasphere%20Generic%20JDBC%20Connection%20Dialog

SAP Datasphere Generic JDBC Connection Dialog

 

5. Create a remote table in SAP Datasphere databuilder for a Databricks table and preview to check if data loads.  

Remote%20Table%20in%20SAP%20Datasphere%20showing%20data%20from%20Databricks 

Remote Table in SAP Datasphere showing data from Databricks

 

STEP 4: Create Analytical dataset in SAP Datasphere to join live SAP and non-SAP(Databricks) data into one unified semantic model .  

 

You can see the live query push downs happening at the Databricks compute cluster from the Log4j logs when data is previewed in SAP Datasphere models. 

 

STEP 5: Connect to this Analytical unified data model live from SAP Analytics Cloud and create visualizations that illustrate quick business insights.  

For example, the dashboard below shows real time truck and shipment status for customer shipments. The live IoT data from Databricks delta lake that holds the real-time truck data is federated and combined with customer and shipment master data from SAP systems into a unified model used for efficient and real-time analytics. 

SAP%20Analytics%20Cloud%20Story%20Dashboard%20-%20Visualizing%20live%20data%20from%20Databricks

SAP Analytics Cloud Story Dashboard – Visualizing live data from Databricks

 

We hope this quick tutorial helps you in your data journeys and exploring the exciting new features available in SAP Datasphere.  We’d love to get your thoughts & opinions.  So please leave us a comment below.  And don’t forget to give us a like too if you found this blog especially useful!  Thanks for reading!   

Please read our next blog here to learn about how FedML-Databricks library can be used to federate live data from SAP Datasphere’s unified semantic data models for doing machine learning on Databricks platform. 

For more information about this topic or to ask a question, please contact us at paa@sap.com  

Sara Sampaio

Sara Sampaio

Author Since: March 10, 2022

0 0 votes
Article Rating
Subscribe
Notify of
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x