Harnessing Multi-Model Capabilities with Spotify – Processing Semi-Structured Data with SAP HANA Cloud/ SAP Datasphere – Part 2

In this part , we will be discussing about deployments in SAP HANA Cloud . We will be explaining the following :

Spotify Developer Access Set up & Configuration
Understanding Spotify APIs – Testing from the site directly
Scenario 1 – Python Scripts for Playlists , Audio features& Explanation
JSON Documents validation in SAP HANA Cloud
Querying on JSON Collection with different scenarios
Updating JSON Collections for Reporting
Creating SQL Views based on JSON Documents
Cloning the Git Repository based on the SQL Views
Discussion on next blogs focusing Datasphere Scenarios & Visualizations

If you want to revisit the architecture, please check this blog once.

Let’s start

Spotify Developer Access and Setup

In order to access the Spotify APIs, you need to set up the Spotify Dashboard by creating credentials for integrations. Login to Spotify Dashboard. Of course you can use your credentials of Spotify free or premium account.

Once logged in, chose the option “Create an App “, and it will provide you Client ID and Credentials. In my case, app is ValidateHANA.(Sure I could have gone with a better name !)

Also edit the settings and add a fallback URL. As you are testing, you can just provide the redirect URI as http://localhost:8888/callback. You won’t be able to execute the python scripts using Spotify APIs unless you set the redirect URI

Now you are set to access the Spotify APIs. Let’s first try to understand how to use the Spotify APIs and some of the key parameters first.

Understanding Spotify APIs

Let’s work on two scenarios to understand how we consume spotify APIs

Scenario 1

For Scenario 1, we will consume 2 APIs

Playlist API- Use this to retrieve top 50 songs of the week for 10 countries and stored it as JSON response.
Audio Features API – For all the songs retrieved from Playlist API, retrieve the audio features such as Speechiness, Energy and liveness and store it as JSON response.
Create SQL Views to create joins between these 2 API responses.
Consume the SQL Views within HDI containers as Calculation Views
Consume the Calculation view in SAP Analytics Cloud / Miscrosoft Power BI.

Let’s start with Playlist API.

In order to retrieve all tracks from Top Chart USA, we need to know the playlist ID of TOP Songs-USA as shown below. In this case for “Top Songs- USA” it is 37i9dQZEVXbLp5XoPON0wI. So if we need to work on specific public playlists for which we are going to collect all songs and its corresponding metadata, then we need to know the playlist id. I already have 10 playlist IDS for different countries which we will be using in python scripts.

So how does it with work with Playlist IDs

Here is the developer console where you can test the API endpoints. [Similar to how we test in our API hub]. All the available APIs are displayed in the left side of the screen. And let’s try the Playlists -> Get a Playlist API Endpoint

Once you select the API, you need to provide the playlist_id which is mandatory. And on the right it shows how to execute using CURL

Finally, select the OAuth token. Spotify will suggest to add if any scopes are needed. For this API , you do not need to add scopes. [Think of scopes like how we add in node js apps ]. Once the token is requested, you will see the OAuth token on screen and you can try the API.

The response would the top 50 songs in the form of json .

The response would be pretty long and would advise you to use the json viewer to navigate through the data . Once you copy the JSON in viewer and navigate to viewer, you will notice 50 track responses and navigating through every response, you can identify attributes such as track name, artist name , album name , popularity and other relevant metrics for every track.

So you know how to use playlist API to read JSON data. Also importantly you cannot ingest this whole JSON . The HANA document store will not accept it . You have to extract from track or album data to push into HANA Cloud . If you see the initial response, it starts with “collaborative”:false.. The document store will fail to recognize it as valid identifier. So we have to delve down to track or album to get the right format. We can handle that in the python script.

And the other API which I will be using in scenario 1 is “Get Audio features for a track”

So the previous API will retrieves all tracks in Top USA playlist. And the Audio features API will provide all the features such as danceability , liveness , speechiness, energy for every track. So what are these metrics? It’s very interesting on how Spotify API analyzes the audio tracks. You can refer the API document but here is the crisp explanation of these metrics.

Danceability – Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable

Speechiness – Speechiness detects the presence of spoken words in a track. The more exclusively speech-like the recording (e.g. talk show, audio book, poetry), the closer to 1.0 the attribute value. Values above 0.66 describe tracks that are probably made entirely of spoken words. Values between 0.33 and 0.66 describe tracks that may contain both music and speech, either in sections or layered, including such cases as rap music. Values below 0.33 most likely represent music and other non-speech-like tracks.

Liveness – Detects the presence of an audience in the recording. Higher liveness values represent an increased probability that the track was performed live. A value above 0.8 provides strong likelihood that the track is live

Energy – Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy. For example, death metal has high energy, while a Bach prelude scores low on the scale. Perceptual features contributing to this attribute include dynamic range, perceived loudness, timbre, onset rate, and general entropy.

So now you know what we are going to ingest in to HANA Cloud .

Playlist API – which provides all the top 50 tracks for 10 countries .

Audio Features API – which provides all the audio features for these 500 songs

Scenario 2

If you do not want to manually provide playlist IDs, then you can use the API Get Category playlists. You can use this API and search for “workout”. It will provide all the playlist IDS for workout albums. Then you can use the scenario1 APIs to get the data.

I will be covering Scenario1 in the blogs. Now we got some intro into Spotify APIs. Now let’s get into actual python scripts which we will use to consume the Spotify APIs .

Executing Python Scripts

For Spotify APIs, we have the package Spotipy and they have code snippets to access every API. You can also refer to their github if you want to explore other scenarios. So the following code will read 10 different country playlists for a week , get all tracks as explained before in Scenario1 and ingest all into HANA Cloud as json documents .

import pandas as pd
import spotipy
import spotipy.util as util
import json
import hana_ml
from hana_ml.docstore import create_collection_from_elements as jsoncollection
import hana_ml.dataframe as dataframe


client_id = ""  #Your Client ID from Spotify
client_secret = "" #Your Client Secret from Spotify
redirect_uri = "http://localhost:8888/callback"
username = "<Your spotify username>"
scope = "user-library-read user-top-read playlist-read-private"

# Get the access token
token = util.prompt_for_user_token(
    username, scope, client_id, client_secret, redirect_uri)

# Create the Spotify client
sp = spotipy.Spotify(auth=token)
# setup hana connection
conn = dataframe.ConnectionContext(address='<Your HANA Cloud host>',  # Provide your OpenSQL Host here
                                   port=443,
                                   user='',  # Provide your  Username here
                                   password='',  # Provide your  password here
                                   encrypt='true'
                                   )

#Process all tracks and items of a specific playlist
def get_top2022_playllst(playlistid, country):
    playlist2022 = playlistid
    collname   = country
    toptracks1 = sp.user_playlist(username, playlist2022)
    tracks1 = toptracks1["tracks"]
    toptracks = tracks1["items"]
    #Call the insert function now 
    finaljson = insert_json_hcdb(toptracks,country)
    print(country + " tracks has been succesfully inserted")
    return toptracks , country


#Collect all the playlist data and insert it into HANA Cloud as Collection 
def insert_json_hcdb(toptracks1, country1):
    jsoncollection(
            connection_context=conn,
            collection_name=country1,
            elements=toptracks1,
            drop_exist_coll=True,
            schema="<Your Schema Name>"        
    )
    return country1



# Get all top playlist from different countries

playlist2022 = get_top2022_playllst('37i9dQZEVXbLp5XoPON0wI','TOPUSA1')
playlist2022 = get_top2022_playllst('37i9dQZEVXbK4fwx2r07XW','TOPAUS1')
playlist2022 = get_top2022_playllst('37i9dQZEVXbLJ0paT1JkgZ','TOPCHILE1')

playlist2022 = get_top2022_playllst('37i9dQZEVXbKPTKrnFPD0G','TOPARG1')
playlist2022 = get_top2022_playllst('37i9dQZEVXbM1EaZ0igDlz','TOPAUSTR1')
playlist2022 = get_top2022_playllst('37i9dQZEVXbMw2iUtFR5Eq','TOPDEN1')

playlist2022 = get_top2022_playllst('37i9dQZEVXbK8BKKMArIyl','TOPGER1')
playlist2022 = get_top2022_playllst('37i9dQZEVXbMWDif5SCBJq','TOPIND1')
playlist2022 = get_top2022_playllst('37i9dQZEVXbIWlLQoMVEFp','TOPNZ1')
playlist2022 = get_top2022_playllst('37i9dQZEVXbJZGli0rRP3r','TOPKOR1')

As you see in the code, lines 1-7 import the necessary packages from Spotipy an hana_ml. IN line 6, we use the method create_collection_from_elements to store the captures response as json in HANA Cloud. In lines 10-14 , you set up the necessary auth tokens . You can ignore the scope as reading public playlists doesn’t need it. Finally in line 21 you have all the necessary tokens needed to used the Spotify APIs.

And Lines 23 -28 establishes connection to SAP HANA Cloud .

I have two functions

get_top2022_playllst – Collects all tracks from a playlist with playlist and country as input.

insert_json_hcdb – inserts the playlists as collections .

In line 38, I call the insert function within get_top2022_playllst function capturing the track info .

If you remember I mentioned before that you cannot insert the whole JSON response from Spotify . So we have to navigate and insert only the JSON response with proper identifier. In line 35-36, I go through every JSON response and just take the info of items of every track and not the whole JSON response. Lets’ explore that once we have inserted successfully.

Finally here is the input which call top play lists for the week for 10 countries.

Hope you understood the flow now . This triggers the function get_top2022_playllst to get track info for every country and within the function we insert information as it processes every playlist.

Finally, I execute the code from Visual Studio and I could see the success message that all the tracklists has been ingested into SAP HANA Cloud.

Now the best part is to go and explore from HANA DB explorer, and see all the JSON collections inserted into the specified schema. When I right-click on TOPARG1 and select view JSON, it will show the ingested JSON response.

As mentioned before, you can use the JSON viewer to read the JSON response for all tracks. Let’s take the top tracks for USA . Here you see 50 JSON responses.

Let’s navigate through one response and try to understand how we can query using identifiers. If you remember, I have mentioned about not ingesting the whole response from Spotify but move only from track info . As you see down, we just push the track info which has all the necessary information.

If I further navigate through the track folder, I can get track name and popularity . Also, it will have information about album and artists as nested objects. In this case, track name is Kill Bill .

If I navigate into album or artists, I will get the album name and artist name. In this case album name is “SOS” and artist name is “SZA”.

So now we navigated through a JSON nested object and figured out which are the identifiers which will help us provide the relevant information . So if we have to extract the same from HANA Cloud now , here is the query for the same on JSON Document “TOPUSA1”

SELECT "track"."name" AS "TRACK NAME",  --check the previous screenshot to understand navigation
"track"."album"."name" AS "ALBUM NAME"
from "BTP1"."TOPUSA1"

But if I add the artist name, the response would be NULL. So, what do you think the reason is?

This is because some of the track songs have more than one artists (why not ? would love to hear Rihanna feat. Jay-Z ).

When there are deep nested objects for artists, then we need to unnest it. IN this case we have 2 artists

So now our new query would be to include artist name but to unnest them and here you go

SELECT "track"."name" AS "TRACK NAME",
"track"."album"."name" AS "ALBUM NAME",
"t"."name" as "Artist NAME"
from "BTP2"."TOPUSA1"
UNNEST "track"."album"."artists" as "t"

I want to pull images to display them in Microsoft Power BI. But there are 3 images for the album image with different height and width. SO I have to unnest the image too .

So now we want the album name , track name , multiple artists name and image .So here is the final query.

SELECT "track"."name" AS "TRACK NAME",
"track"."album"."name" AS "ALBUM NAME",
"i"."url" AS "IMAGE",
"t"."name" as "Artist NAME"
from "BTP1"."TOPUSA1"
UNNEST "track"."album"."images" as "i"
UNNEST "track"."album"."artists" as "t"
WHERE "i"."height" = 300

So we have the Playlist API responses for the 10 countries .I think this is pretty cool to have as a standard feature.

Updating JSON Documents

We need to update specific attribute for all the JSON responses which will help us group all the tracks by country. We need this as we will be joining the response with next API “Audio features”. Let’s take the Top Argentina playlist[TOPARG]. As seen below, the identifier “type” is always ‘track’ for all the responses.

In order to join the JSON response for playlists with audio features, I will be updating this value to respective country identifiers .

UPDATE  "BTP2"."TOPARG1"  SET "track"."type" = 'TOPARG';

And you should be doing it for all the playlist for different countries.

Once updated, validate once to make sure it is updated. Try to open the JSON response for one of the country playlists in JSON viewer. In this case, you will see the value updated as ‘TOPARG’

And we are all done with Playlist APIs ! Let’s move on to the next API.

Audio Features API

As mentioned before, we will be extracting audio features for the playlists from previous APIs. So here is the script for the same.

import pandas as pd
import spotipy
import spotipy.util as util
import json
import hana_ml
from hana_ml.docstore import create_collection_from_elements as jsoncollection
import hana_ml.dataframe as dataframe

client_id = ""  #Insert your client credentials
client_secret = ""  #Insert your client credentials
redirect_uri = "http://localhost:8888/callback"
username = "<Your Spotify Username>"
scope = "user-library-read user-top-read playlist-read-private"

# Get the access token
token = util.prompt_for_user_token(
    username, scope, client_id, client_secret, redirect_uri)

# Create the Spotify client
sp = spotipy.Spotify(auth=token)
# setup hana connection
conn = dataframe.ConnectionContext(address='Your HANA Cloud Host',  
                                   port=443,
                                   user='',      #Your HANA Cloud credentials
                                   password='',  #Your HANA Cloud credentials
                                   encrypt='true'
                                   )

#Process all tracks and items of a specific playlist
def get_top2022_playllst(playlistid, country):
    playlist2022 = playlistid
    collname   = country
    toptracks1 = sp.user_playlist(username, playlist2022)
    tracks1 = toptracks1["tracks"]
    toptracks = tracks1["items"]
    track_ids = []
    track_names = []
    track_artists = []
    for i in range(0, len(toptracks)):
        if toptracks[i]['track']['id'] != None: 
            track_ids.append(toptracks[i]['track']['id'])
            track_names.append(toptracks[i]['track']['name'])
            track_artists.append(toptracks[i]['track']['artists'])
        features = []
    for i in range(0,len(track_ids)):
        audio_features = sp.audio_features(track_ids[i])[0]
        track_popularity = {'popularity': sp.track(track_ids[i])['popularity']}
        genre = {'genres': sp.artist(track_artists[i][0]['uri'])['genres']}
        audio_features = dict(audio_features, **track_popularity, **genre)
        features.append(audio_features)
        #af = json.dump(audio_features,track_ids)
        #print(af)
        
    #Call the insert function now 
    finaljson = insert_json_hcdb(features,country)
    print(country + " tracks has been succesfully inserted")
    return audio_features , country


#Collect all the playlist data and insert it into HANA Cloud as Collection 
def insert_json_hcdb(audio_features1, country1):
    #print(audio_features1)
    jsoncollection(
            connection_context=conn,
            collection_name=country1,
            elements=audio_features1,
            drop_exist_coll=True,
            schema="BTP"        
    )
    return country1



# Use the same playlist IDs that we used for Playlist APIs
# Please do not change the input as we will be joining both the  responses

playlist2022 = get_top2022_playllst('37i9dQZEVXbLp5XoPON0wI','TOPUSAAUD')
playlist2022 = get_top2022_playllst('37i9dQZEVXbK4fwx2r07XW','TOPAUSAUD')
playlist2022 = get_top2022_playllst('37i9dQZEVXbLJ0paT1JkgZ','TOPCHILEAUD')
playlist2022 = get_top2022_playllst('37i9dQZEVXbKPTKrnFPD0G','TOPARGAUD')
playlist2022 = get_top2022_playllst('37i9dQZEVXbM1EaZ0igDlz','TOPAUSTRAUD')
playlist2022 = get_top2022_playllst('37i9dQZEVXbMw2iUtFR5Eq','TOPDENAUD')

playlist2022 = get_top2022_playllst('37i9dQZEVXbK8BKKMArIyl','TOPGERAUD')
playlist2022 = get_top2022_playllst('37i9dQZEVXbMWDif5SCBJq','TOPINDAUD')
playlist2022 = get_top2022_playllst('37i9dQZEVXbIWlLQoMVEFp','TOPNZAUD')
playlist2022 = get_top2022_playllst('37i9dQZEVXbJZGli0rRP3r','TOPKORAUD')

I have used the same functions

get_top2022_playllst – Extended to loop around every soundtrack to extract audio features

insert_json_hcdb – Inserts the JSON response as the audio response features

Let’s check our DB explorer and you will see corresponding audio JSON collections are added for every playlist. And you will see all the audio features as mentioned . You can read here to know more on audio features. Now for audio features for Argentina, we see all the features. Also we have to update one of the identifier here for Country name as we will be using this response for joining in SQL views. So we will be updating the “type” identifier to “TOPARG”.

UPDATE "BTP2"."TOPARGAUD" SET "type" = 'TOPARG'

Let’s check once to make sure we updated the right identifier.

Similarly, we will be updating audio feature responses for other country playlists too.

So we are done with API responses and updated with necessary adjustments. Now we are ready to create SQL views based on JSON responses.

Creating SQL Views

So you should be having 20 JSON Collections now . And for every country you will have JSON collections for playlist response and audio features .

If you don’t have two JSON Collections for every country, then 🙂

Let me show the example for creating SQL views for one country. And you need to do the same for all the countries.

We will be creating 3 SQL views for every country

TOPARG – Selecting Columns with identifiers for playlists information

ARG_AUDIO – Selecting Columns with identifiers for audio identifiers

ARGENTINA – Joining these 2 based on trackid and country grouping

SQL for TOPARG View:
=======================
CREATE VIEW "BTP2"."TOPARG" ( "TRACK NAME", "TRACKLISTNAME", "TRACK ID", "ALBUM NAME", "IMAGE", "Artist NAME" ) 
AS (SELECT "track"."name" AS "TRACK NAME", "track"."type" as "TRACKLISTNAME",
  "track"."id" AS "TRACK ID",
"track"."album"."name" AS "ALBUM NAME",
"i"."url" AS "IMAGE",
"t"."name" as "Artist NAME"
from "BTP"."TOPARG1"
UNNEST "track"."album"."images" as "i"
UNNEST "track"."album"."artists" as "t"
WHERE "i"."height" = 300)

SQL for ARG_AUDIO View:
======================
CREATE VIEW "BTP2"."ARG_AUDIO" ( "ID", "DANCEABILITY", "SPEECHINESS", "ENERGY", "POPULARITY", "LIVENESS", "AUDIOTYPE" ) 
AS (SELECT  "id" AS "ID", "danceability" AS "DANCEABILITY", "speechiness"as "SPEECHINESS", "energy" as "ENERGY","popularity" AS "POPULARITY"
, "liveness" as "LIVENESS", "type" as "AUDIOTYPE" FROM "BTP"."TOPARGAUD")

SQL for joining both these views
=======================================
CREATE VIEW "BTP2"."ARGENTINA1" ( "TRACK NAME", "TRACKLISTNAME", "TRACK ID", "ALBUM NAME", "IMAGE", "Artist NAME", "DANCEABILITY", "SPEECHINESS", "ENERGY", "POPULARITY", "LIVENESS" ) AS (SELECT 
	"TRACK NAME",
	"TRACKLISTNAME",
	"TRACK ID",
	"ALBUM NAME",
	"IMAGE",
	"Artist NAME",
	TO_DECIMAL("DANCEABILITY",13 ,3),
	TO_DECIMAL("SPEECHINESS",13,4),
	TO_DECIMAL("ENERGY",13,3),
	TO_INTEGER("POPULARITY"),
	TO_DECIMAL("LIVENESS",13,4)
FROM "BTP"."TOPARG"
INNER JOIN "BTP"."ARG_AUDIO"
ON "TRACK ID" = "ID" AND "TRACKLISTNAME" = "AUDIOTYPE")

Assuming the final view(ARGENTINA1) got activated, you should see all the columns in a single view having all playlist information , audio features and images.

You have to execute those 3 SQLs for all the 10 countries. Finally you will have 10 SQL views for all the countries

Once you have all the 10 SQL Views, you will create one final SQL view with UNION ALL of all this SQL views for 10 countries. You could either do this UNION in SAP Business Application Studio[SAP BAS] or here. Both would work.

CREATE VIEW TOPALL AS (
SELECT * FROM "BTP3"."ARGENTINA"
UNION ALL
SELECT * FROM "BTP3"."AUSTRALIA"
UNION ALL
SELECT * FROM "BTP3"."AUSTRIA"
UNION ALL
SELECT * FROM "BTP3"."CHILE"
UNION ALL
SELECT * FROM "BTP3"."DENMARK"
UNION ALL
SELECT * FROM "BTP3"."GERMANY"
UNION ALL 
SELECT * FROM "BTP3"."INDIA"
UNION ALL 
SELECT * FROM  "BTP3"."KOREA"
UNION ALL 
SELECT * FROM "BTP3"."NEWZEALAND"
UNION ALL
SELECT * FROM "BTP3"."USA"
)

So we have the final view with metrics for all countries and we can group by TRACKLISTNAME column.

Building HDI Containers with Business Application Studio[BAS]

In order to start with BAS subscription, you can refer the BAS onboarding tutorial. In case you are looking for steps to clone and how to use the code, we have explained in our Discovery missions . Here are the steps you follow assuming you have the subscription and logged into your Space from BAS.

You can clone our repository from your BAS. Here is the link to our repository.

Once you clone, you will notice the folder structure

hdbgrants file with grant access to schema where all JSON collections and SQL Views are created. Please make sure the schema name mentioned in the grants file is same as the schema name you have in your SAP HANA Cloud.
hdbsynonym file with one synonym for the TOPALL sql view that we created previously.
Calculation View consuming the TOPALL sql view for frontend reporting
hdbrole with role that should be granted to a DB user executing the report from SAP Analytics Cloud

Within the calculation view, you will see a calculated column based on measure “Speechiness“. I had to do this as the Web API documentation mentions – Values between 0.33 and 0.66 describe tracks that may contain both music and speech, either in sections or layered, including such cases as rap music. Values below 0.33 most likely represent music and other non-speech-like tracks. We will need this column for the visualization . We will cover that visualization part in the next blog.

As mentioned before, the hdbrole file has a role “json::TRACKSALL” that will have access to Calculation view. Once the project is deployed successfully, the role will be deployed in SAP HANA Cloud. You can create a HANA DB user and grant access to the role. This will be the business user executing from SAC or Microsoft Power BI.

Once you have created the DB users with the access to Calculation View, we are now ready to build SAC Story. We will cover the visualization in the next blog which will be published in couple of days. Our next blog will include

Ingesting JSON responses as large objects in SAP Datasphere.
Querying the large objects using JSON functions and building SQL views
Consuming the SQL Views using Data Builder in SAP Datasphere.
Data Visualization in SAC using Trend Charts & R computations
Data Visualization in Microsoft Power BI

And of course there will be follow up blogs covering other multi-model capabilities based on different Spotify API scenarios. Please do share your feedback regarding the scenarios and approach we took. And if you have issues related to access git repositories or issues with the code, do let us know. Happy learning 🙂

References:

Tags: Open Source Python SAP Business Application Studio SAP Business Technology Platform SAP HANA Cloud SAP HANA multi-model processing

Sara Sampaio

Author Since: March 10, 2022

0 0 votes

Article Rating

0 Comments

Inline Feedbacks

View all comments