Introduction:
In my last blog, we discussed how to ingest data from SAP to Azure Data Lake using SAP DS.
Please click the link below for the details:
How to ingest data from SAP to Azure Data Lake using SAP DS
This blog is a continuation of that blog, and it focuses further on the difficulties we had transferring data from SAP table/extractor to Azure Data Lake using SAP DS.
Requirement:
We utilize SAP DS to load data from an SAP table or extractor into Azure Data Lake. We encountered a few issues when transferring data from SAP to Azure Data Lake using SAP DS.
Issues listed below are compatible with SAP and have a fix:
Unhidden tab in the SAP table’s data caused a data issue in the DS file:
When we load data from an SAP table, there are occasionally some fields that contain data with tabs that cause data issues. When this happens, the job fails with the error below:
Approach 1:
To fix this, we may use the regex_replace function to replace the tab with a blank and remove any concealed tabs or other characters.
Please click the following link for additional information:
All the characters mentioned in the ‘regular expression pattern string’ will be replaced with the ‘replacement string’.
Approach 2:
The functions regex_replace, replace_substr, and translate are inoperable for a few tables. In this instance. We must exclude inaccurate records as our only choice to tackle this. We keep track of errors in a file called an error log file.
We enable error handling at the source table level by selecting the overflow file checkbox and entering the file location where the error entries will be loaded.
Where you want to put the error records file must be specified, along with a file path and file name. The error log file will be updated with error entries; however, these records won’t replace any existing ones; rather, they will be appended to the existing file.
How many entries are being put into the blob file and how many of those have incorrect records may both be seen in the trace log.
Language’s characters issue in SAP table caused data issue in DS:
When we load data from a table into a flat file that also contains data in languages other than English, we can receive junk data for those languages.
The data disparity between the file generated by SAP DS and SAP is depicted in the image above. After importing the Chinese data from SAP into a csv file, the ARTXT field contains junk data.
Solution:
Several file format configuration modifications are required to resolve the issue.
- UTF-8 must be selected as the code page.
- We must select YES for write BOM.
For more detail please go through the below link:
Input/output properties in file format
Issue while loading delta for 0CO_OM_WBS_7 in DS:
With init without delta transfer, it is not possible to load the delta in DS for 0CO OM WBS 7. For more information, please see the SAP notes listed below:
1458612 – Datasources with/without Init simulation
The strategy listed below was used to tackle the aforementioned problem.
- Both full load and delta load are handled by extractor with CDC. In order to keep the job functioning as a delta init with data transfer, we are maintaining the initial load setting of Yes and the Extract from datetime setting of blank for full load.
- For delta load we are keeping the settings we use for delta load. i.e. the initial load setting is No and Extract from datetime setting is $g_load_date.
Below mentioned issues are compatible with ADL:
If SAP fields and table/extractor name consist of “/”:
If any table/extractor fields or table/extractor names contain the character /, this is unacceptable in Azure Data Lake and must be changed to” _”. In DS, it is simple to import this kind of table/extractor, however in ADL, we must rename the field names and the table/extractor names.
Issue with date field which has SAP data for the year <1900
ADL only retrieves data beyond the year 1900 due to year constraint. Because it requires an ADL global settings modification that will have an impact on a downstream project, it cannot be handled by ADL.
Using decode(), we can manage it in DS for both full and delta load.
For both the full and delta jobs, we are utilizing the following decode() mapping on the date field to replace years that are less than 1900. It will use the following function to change the value to January 1, 1900:
decode (year(Field Name)< 1900, ‘1900.01.01’, “Field Name”)
For more detail on decode please go through the below link:
decode
We are updating below record:
Summary:
I hope that would be helpful. This is precisely how we tackled the problems. The procedures discussed above can be used to fix problems with any tables or extractors.
Please refer below links for related topics:
Input/output properties in file format
1458612 – Datasources with/without Init simulation
Please feel free to share the feedback and your thoughts in a comment, stay tuned and follow my profile for the further updates.