SAP Datasphere View generation with Python and the Command-Line Interface

Welcome to this technical blog post where we’ll delve into generating SAP Datasphere SQL views from SAP Datasphere remote tables, and subsequently deploying them into SAP Datasphere using the SAP Datasphere Command-Line Client (CLI). Most of the text in this blog consists of the actual Python code that accomplishes this task.

Use case

Let’s set the stage with some background. Recently, I handled a case where a customer used SAP Datasphere to create a considerable amount of remote tables. Apart from modelling these tables within SAP Datasphere, the objective was to make them accessible to external tools as well, using a database user that connects to the Open SQL schema. Another requirement was that HDI containers would be able to read these remote tables.

However, remote tables, at time of writing, cannot directly be exposed to the Open SQL schema or to an HDI container. The option simply does not exist in the UI to provide these privileges to Open SQL schema users. A solution to this is to encapsulate each remote table within a simple view and make these accessible to the Open SQL schema. This solution, although effective, needs to be repeated manually for hundreds of tables, and would need to be redone if an underlying remote tables were modified. Therefore we turned to object generation.

Object generation

All modelling objects in SAP Datasphere are stored as text-based objects, known as CSN (CDS Schema Notation) models. They can be exported and imported using the SAP Datasphere user interface, but also using the Command-Line Client that comes with the product. This allows us to solve the use case, namely by taking a remote table CSN, and creating a brand-new view CSN from it. Since the remote table CSN already contains all metadata we need, all we need to do is write a script that does the actual conversion. This I’ve done with Python, of which the code is pasted below. Since there is a 1-1 relation between the SQL view and the remote table, the code isn’t that hard to generate. Basically, we need to read the metadata from the remote table, which are the column names and some additional metadata, and write this into a view definition. A SQL view definition has the preference over a graphical view definition, as its structure is simpler and therefore easier to generate.

Pre-requisites

Install Datasphere CLI (npm install -g @sap/dwc-cli or check this SAP help page)
Create Database Analysis User (with Space Schema access) on Datasphere for connection to HDB, see this SAP Help page.
Prepare DSP CLI secrets file according to this SAP help page.
If you want to avoid a login process while running the code, adapt the code under “Logon procedure”. See this SAP help page.
To use the hdbcli package, install using pip: pip3 install hdbcli

Please note I used version 2023.4.0 of the Command-Line Interface, which at time of writing is the latest version available.

The code

Below the full Python code to generate SQL views from remote tables, and push these into SAP Datasphere. After inserting your SAP Datasphere connection info, you can basically run the script, open up your SAP Datasphere data builder file overview, and wait for the generated view definitions to pop up. Alternatively you can watch the log statements being printed to see what is going right or wrong ;-).

A few additional notes:

You will have to adjust the static variables to connect to SAP Datasphere and the underlying SAP HANA Database;
The code contains a SQL query that selects all remote tables inside a space. Adjust this query if you just want to generate only a subset;
View definitions are pushed one by one. Technically objects can be combined in one space definition, but that might make the resulting CSN too large to be successfully processed by SAP Datasphere;
At this moment a wait function is implemented to allow for full deployment of the view before the next view is pushed. Without the wait function, the next view would be pushed while there is a deployment pending, which might block the next deploy. I’ll update this blog once SAP Datasphere does not need this client-side wait function anymore;
Access to the remote table metadata is provided through the SAP HANA Database. This proved easier than fetching object definitions from reading the space definition;
In the code I deliberately used long names for providing the CLI options, as these are better readable and are static of nature, so that the code is upward version compatible.

# -----------------
# Introduction
# -----------------
# Datasphere (DSP) View generation based on Remote Tables
# This Python script generates design time SQL views in DSP for each Remote Table (RT) that 
# resides in a given space and then pushes the view definition to DSP via the DSP Command Line Interface.
# The view is "exposed" so that it can be accessed from the Open SQL schema
#
# At time of writing there was a wait function implemented to prevent parallel deployments, this might not be needed in future.
#
# Author: Sefan Linders
# Date of last change: 16/5/2023

# Prereqs
# - Install Datasphere CLI (npm install -g @sap/dwc-cli)
# - Create Database Analysis User (with Space Schema access) on Datasphere for connection to HDB. https://help.sap.com/docs/SAP_DATASPHERE/9f804b8efa8043539289f42f372c4862/c28145bcb76c4415a1ec6265dd2a4c11.html?locale=en-US
# - Prepare DSP CLI secrets file according to https://help.sap.com/docs/SAP_DATASPHERE/d0ecd6f297ac40249072a44df0549c1a/eb7228a171a842fa84e48c899d48c970.html?locale=en-US#log-in-by-passing-oauth-client-information-in-a-secrets-file
# - If you want to avoid a login process, use the following instructions and adapt code under "Logon procedure": https://help.sap.com/docs/SAP_DATASPHERE/d0ecd6f297ac40249072a44df0549c1a/eb7228a171a842fa84e48c899d48c970.html?locale=en-US#avoid-running-a-login-command-by-extracting-access-and-refresh-tokens
# - To use the hdbcli package, install using pip: pip3 install hdbcli

# -----------------------------------------------------------
# Static variables, these you have to adjust
# -----------------------------------------------------------

# dsp
dsp_host = 'https://<your-host-url>'  # Placeholder for the SAP Datasphere URL
dsp_space = '<your-dsp-space>'  # Placeholder for the technical space name

# hdb
hdb_address='<your-hdb-address>'  # Placeholder for the SAP HANA database (hdb) address
hdb_port=443  # Port for the hdb, usually 443
hdb_user='<your-hdb-user>'  # Placeholder for the hdb user
hdb_password='<your-hdb-password>'  # Placeholder for the hdb password

# settings
view_suffix = '_V'  # Suffix for the generated views
secrets_file = '<path-to-your>/dwc_cli_secrets.json'  # Path for the secrets file
deploy_wait_time_in_seconds = 35  # Time to wait for deployment
export_folder_path = '<path-to-your>/scripts/dsp_cli_view_generation/'  # Path to export folder


# -----------------------------------------------------------
# Package import
# -----------------------------------------------------------

import subprocess # For OS commands on DSP CLI
import json # For handling the CSN files which are in JSON format
import time # For wait function to let an object deploy finish before starting the following
from hdbcli import dbapi # To connect to SAP HANA Database underneath SAP Datasphere to fetch Remote Table metadata

# -----------------
# print DSP CLI version
# -----------------
command = 'dwc -version'
print(command)
subprocess.run(command, shell=True)

# -----------------------------------------------------------
# CLI logon procedure with oAuth authentication to DSP CLI
# -----------------------------------------------------------

# logout is needed to have the login consider new creds file, e.g., in case it is replaced with new client id/secret
command = f'dwc logout'
print(command)
subprocess.run(command, shell=True)

# login
command = f'dwc login --secrets-file {secrets_file}'
print(command)
subprocess.run(command, shell=True)

# Optional command to debug or to get the access and refresh token to avoid login command (see header comments)
# command = 'dwc secrets show'
# print(command)
# subprocess.run(command, shell=True)

# initialize with host name
command = f'dwc cache init --host "{dsp_host}"'
print(command)
subprocess.run(command, shell=True)

# -----------------------------------------------------------
# Wait function, used to prevent overload of space deployment
# -----------------------------------------------------------
def wait(seconds):
    for i in range(seconds):
        if i > 0 and i % 10 == 0:
            print()
        print('.', end='', flush=True)
        time.sleep(1)
    print()

# -----------------------------------------------------------
# This function takes a Remote Table (RT) CSN as input and transforms it to a SQL View CSN
# -----------------------------------------------------------
def remote_table_to_view(rt_csn):
    rt_name = list(rt_csn['definitions'].keys())[0]
    view_name = rt_name + view_suffix

    elements = rt_csn["definitions"][rt_name]["elements"]
    elements_string = ', '.join(f'"{element}"' for element in elements)

    view_csn = {
        "definitions": {
            view_name: {
                "kind": "entity",
                "elements": elements,
                "query": {
                    "SELECT": {
                        "from": {"ref": [rt_name]},
                        "columns": [{"ref": [element]} for element in elements]
                    }
                },
                "@EndUserText.label": view_name,
                "@ObjectModel.modelingPattern": {"#": "DATA_STRUCTURE"},
                "@ObjectModel.supportedCapabilities": [{"#": "DATA_STRUCTURE"}],
                "@DataWarehouse.consumption.external": True,
                "@DataWarehouse.sqlEditor.query": f"SELECT {elements_string}nFROM "{rt_name}""
            }
        }
    }
    
    return view_csn

# -----------------------------------------------------------
# This function is not used as part of the generation process, but can help if export of space csn is needed for comparison reasons
# Export space definition into csn file, without any of the entity definitions
# -----------------------------------------------------------

def export_space_def ():
    space_file_read = f'{export_folder_path}space_definition{dsp_space}.csn'
    command = f'dwc spaces read --host {dsp_host} --space {dsp_space} --definitions {rt_name} --output {space_file_read}'
    subprocess.run(command, shell=True)
    with open(space_file_read, 'r') as f:
        space_csn = json.load(f)
    space_csn_dump = json.dumps(space_csn, indent=4)
    print (space_csn_dump)

# -----------------------------------------------------------
# Generates space csn (including view definition) with view csn as input
# -----------------------------------------------------------
def generate_space_csn_with_view(view_csn):
    space_csn = {}
    space_csn = { dsp_space : view_csn }
    return space_csn

# -----------------------------------------------------------
# Write space definition to csn file and return file name
# -----------------------------------------------------------
def write_space_csn(space_csn):
    space_csn_pretty = json.dumps(space_csn, indent=4)
    view_name = next(iter(space_csn[next(iter(space_csn.keys()))]["definitions"].keys()))
    space_csn_file = f'{export_folder_path}space_{dsp_space}_object_{view_name}.csn'
    with open(space_csn_file, 'w') as f:
        f.write(space_csn_pretty)
    return space_csn_file

# -----------------------------------------------------------
# Push view csn to DSP with space definition csn as input
# -----------------------------------------------------------
def push_space_csn_to_DSP(space_csn_file):
    command = f'dwc spaces create --host {dsp_host} --space {dsp_space} --file-path {space_csn_file} --force-definition-deployment --verbose'
    print(command)
    subprocess.run(command, shell=True)

# -----------------------------------------------------------
# Fetch RT metadata from HANA DB, and for each RT create a view and push this to DSP
# -----------------------------------------------------------

# Connect to HDB
conn = dbapi.connect(
    address=hdb_address,
    port=hdb_port,
    user=hdb_user,
    password=hdb_password
)
cursor = conn.cursor()

# select statement to fetch remote table csn's. Selection on highest ARTIFACT_VERSION for each object.
st = f'''
    SELECT A.ARTIFACT_NAME, A.CSN, A.ARTIFACT_VERSION
    FROM "{dsp_space}$TEC"."$$DeployArtifacts$$" A
    INNER JOIN (
      SELECT ARTIFACT_NAME, MAX(ARTIFACT_VERSION) AS MAX_ARTIFACT_VERSION
      FROM "{dsp_space}$TEC"."$$DeployArtifacts$$"
      WHERE PLUGIN_NAME='remoteTable'
      GROUP BY ARTIFACT_NAME
    ) B
    ON A.ARTIFACT_NAME = B.ARTIFACT_NAME
    AND A.ARTIFACT_VERSION = B.MAX_ARTIFACT_VERSION;
'''
print('>>> SELECT statement to fetch remote table definitions')
print(st)
cursor.execute(st)

# Loop over the remote tables, create a view definition, and push its csn to DSP
rows = cursor.fetchall()
conn.close()
total_rows = len(rows)

for i, row in enumerate(rows):
    rt_name = row[0]
    csn = row[1]
    
    # Load remote table csn and print it
    rt_csn = json.loads(csn)
    rt_csn_pretty = json.dumps(rt_csn, indent=4)
    print('>>> Remote table csn of: ' + rt_name)
    print(rt_csn_pretty)
    
    # Get view definition based on remote table and print it
    view_csn = remote_table_to_view(rt_csn)
    view_csn_pretty = json.dumps(view_csn, indent=4)
    print('>>> Generated view csn of: ' + rt_name)
    print(view_csn_pretty)
    
    # Generate space csn
    space_csn_new = generate_space_csn_with_view(view_csn)
    space_csn_new_pretty = json.dumps(space_csn_new, indent=4)
    print('>>> Generated space csn of: ' + rt_name)
    print(space_csn_new_pretty)
          
    # Write space csn to file
    space_file_write = write_space_csn(space_csn_new)
          
    # Push space csn to DSP
    push_space_csn_to_DSP(space_file_write)
    
    # Check if it's the last row
    if i < total_rows - 1:
        # Wait to prevent overload of space deployment
        print(f'waiting {deploy_wait_time_in_seconds} seconds for space deployer to finish previous deploy')
        wait(deploy_wait_time_in_seconds)

Conclusion

You could think of many scenario’s in which you want to generate objects and push these into SAP Datasphere. In this blog I described a the method to generate SAP Datasphere SQL views from remote tables, and push these into the SAP Datasphere repository. I hope it helps you build your own generation scenario, and you got an idea of what is possible with a bit of code and the SAP Datasphere CLI.

Tags: Python SQL

Sara Sampaio

Author Since: March 10, 2022

0 0 votes

Article Rating

0 Comments

Inline Feedbacks

View all comments