It all started with a little project I was working on to synchronize thousands of records from SuccessFactors into a HANA database.
(Yes, I know there are more automatic ways than writing something yourself, but when you know SuccessFactors you know that no SF system API’s look the same.)
The structure of my solution is as follows:
- I have microservice #1 that merely exposes an ODATA API on top of my generic data model. The business layer does nothing but receive data from the POST request and write it to the database.
- My second microservice #2 is a NodeJS application that reads the data from the SuccessFactors ODATA API and calls microservice #1 to write the data to the database.
But between the two services I want to optimize the performance and possibly have multiple instances of microservice #2 to transfer different entities at the same time.
Options
- Sequential
The obvious option is to post my records sequentially to the target service. This is the most straight forward approach, but it is also the slowest. - Parallel
The second option is to post my records in parallel. It’s not a bad idea, but you quickly try to figure out how many parallel requests you can make before you start to get errors. Errors can materialize in the express stack, in the BTP as the BTP thinks you are executing a DoS attack or in the processing of the database requests as you might run out of connections in the connection pool. - Batch
The third option I thought would be my solution was a batch request. I could take all my records and post them in one request. This would be the most efficient way to transfer data, but as the name indicates and the specification states, it will receive the information as a batch, but process them sequentially on the server. - BatchParallel
A fourth option I never implemented might be a combination of option 2 and 3. You could optimize the batch size and number of parallel connections. - Custom REST
The option that the CAP development team recommended is without a doubt the best option. You can add a generic REST endpoint to your service (micrososervice #1) that will receive the data itself and the name of the entity you want to write to. Then you call the services with the complete data set. The implementation will do an optimized, single insert into the database with optimal performance and the complete dataset.
Preliminary Test Results
You can find the details here
Observations
- SAP CAP is a great framework to provide easy ODATA access to your data model.
- For most of you out there that use CAP as a mere backend for UI apps, you might never have to worry about this.
- As expected, sequential single request processing is the slowest approach. The problem is amplified if you have to include network latency as a factor.
- Utilizing batch processing or parallel processing are good ways to improve performance, but require additional effort in tuning the connection pool. As our detailed test results show, the default connection pool settings are not optimal for high volume through put and lead to various errors (getaddrinfo ENOTFOUND, 502 – Bad Gateway, 503 – Service Unavailable).
- The custom REST endpoint approach is the fastest and most efficient approach, but requires additional effort to implement and maintain. In version 6.4 of CAP you must patch CAP to allow for larger request bodies. You can find details in the description of the Reference Server. In this implementation I skipped over proper error handling.
Test Environment
To test the performance of the different options I created a simple test environment.
- A reference service that allows the simulation of a CAP service with a single entity data model.
https://github.com/RizInno/cds-load-refsrv - A test app that can put some load on the reference service to simulate the load.
https://github.com/RizInno/cds-load-test
The boundaries
-
ECONNRESET on too many parallel requests When I increase my number of concurrent connections to >= 550 and iterations in approximately 5s cycles a few calls will execute successfully, but after a few iterations I will get an ‘ECONNRESET’ error when establishing the connnection to the server. There seems to be challenge on the express side when hitting 1000 parallel requests.
See this StackOvervflow article for additional details: https://stackoverflow.com/questions/53340878/econnreset-in-express-js-node-js-with-multiple-requests -
BTP DoS attack prevention When you have microservice #2 running locally and #1 deployed to the BTP then you will run into DoS attack prevention. The BTP will block requests from the same IP at about 900 requests. This usually materializes in a client side error: getaddrinfo ENOTFOUND
- 503 – Service Unavailable – Service unavailable is usually an indication of the connection pool running out of connections when you rapid fire parallel requests. You can adjust the pool configuration to give you more room to maneuver. Details are described in the standard CAP Pool Configuration documentation.
- 502 – Bad Gateway – I have a case where I received a Bad Gateway, but I am still investigating a cause and mitigation.
-
REST endpoint size limit in CAP (as of version 6.4) When I write approximately 200 records, I hit a current limit in the CAP REST endpoint. The request will be rejected with an error ‘PayloadTooLargeError: request entity too large’
expected: 140910,
length: 140910,
limit: 102400,
type: ‘entity.too.large’You can find details in the description of the Reference Server.