After I was clearer on my goal – predicting a general performance issue (i.e. an incident which is so severe that it will get noticed by almost all users) – I can now begin to get more technical.
The most fundamental decision is about the data source. Where do I get reliable data about my SAP systems? In fact, the requirements are pretty high. The input data should be:
- high quality/accuracy
- high quantity
- low latency
- high diversity (ideally a mix of SAP, database, OS, SAN and LAN measurements)
In my case, the decision was quite easy. The SAP systems get monitored by SAP Focused Run, so simple diagnostics agents were already providing performance metrics about the managed SAP systems to a central database. While FRUN provides various metrics, the diversity is somewhat limited, focusing on SAP basis and HANA metrics. The biggest obstacle was the retention time. In our case, the Focused Run system kept the data only for 30 days, so I needed to extract them to a local database with a higher retention time.
The central data store of Focused Run is the table MAI_UDM_STORE. It contains the actual performance measurements, together with other monitoring observations. Column CATEGORY of the table MAI_UDM_STORE lists:
- PERFORM (this is the data I am interested in)
- SELFMON (error messages from the monitoring agents or FRUN)
- AVAIL (availability measurements)
- EXCEPTIONS (error messages from the monitored SAP systems)
- CONFIGURE (detection of misconfigurations)
In typical SAP fashion, the data of the table is not in 3rd normal form, and requires some auxiliary tables to make it readable. In the end, I had to join 6 tables to get some human-readable output:
- MAI_UDM_STORE (raw data, still containing various generated IDs)
- ACCONTEXTDIR (contains the actual SAP SID)
- ACEVENTDIR (contains the event names)
- ACMETRICDIR (contains the metric names)
- MAI_UDM_PATHS (contains details about the monitoring trees in FRUN)
- HDB_METRIC_VALID (required only for the HANA metrics, to retrieve a description for them)
The SAP FRUN system is used primarily for monitoring our SAP systems, so I was not allowed to put any additional workload onto it. A lightweight HANA-to-HANA database interface was easy to create, and I could replicate my desired data to another HANA database for analysis. Each calendar day about 2 GB of data gets transferred, just shy of 200 million rows.
Luckily, I am a data engineer, so creating that data flow was not very difficult. Now my repository starts filling and I can continue with analyzing the stash next.