Testing Techniques
Performance testing is crucial to ensure that the Document Reader SDK can efficiently handle the demands of real-world applications. By evaluating key performance metrics, you can identify the optimal configuration for various environments. This guide provides an overview of how to test and optimize the performance of the Document Reader SDK, including setup instructions, measurement techniques, and result analysis.
Before You Start
Performance Metrics
Regula Document Reader SDK is a client-server system with communication over HTTP. You can use the following metrics for system performance evaluation:
- RPS (Requests Per Second or 'throughput') is the number of requests the system can handle per second.
- Latency is the delay for the system to respond, ms.
- Applications is the number of simultaneous applications (or users) the system can run.
- Workers is the number of workers the current hardware can run on a single instance.
Areas under Test
The Document Reader SDK provides multiple endpoints, each with different functionalities. Verify the endpoints relevant to your business needs by the link Document Reader SDK OpenAPI documentation.
The test framework provides functionality for assessing all the endpoints' performance.
Essential Testing Setup
Before starting the performance testing, check out the product licensing information and contact our sales team to ensure you have the necessary license.
See the capacity testing tips below.
- Locust is used as a load tool.
Note that the Locust performance can be limited by the machine's outgoing bandwidth and CPU. For example, when tests involve sending images of several megabytes, the RPS may not exceed 10 (even with 100 active workers), because the machine is fully loaded.
- Locust sends requests to the Document Reader SDK instance.
- This instance contains Prometheus to collect system metrics, for details refer to the Monitoring documentation.
- Grafana is used for results visualization.
We recommend splitting the Document Reader SDK and the Locust-based Test Load system to different machines to ensure that test load generation does not affect the actual test results. See the following scheme.
For performance testing, you'll need to prepare a copy of the working test setup. For detailed instructions, follow the guide on GitHub. Note that this task should be performed by a qualified specialist.
To make sure that network speed is not a bottleneck in your testing, use speedtest.net or similar tools.
Defining Optimal Workload
Performance depends on many parameters and varies from environment to environment. Therefore, for the best results, you need to measure your performance in real conditions.
Start with the simplest configuration of a single worker to understand the characteristics of the specific machine used for further configuration design. Next, perform measurements and obtain optimal values for the latency and RPS.
To detect the number of Document Reader SDK operations a server can perform, find out the maximum load for the server. For this, launch the gradually increasing number of test 'applications' and measure the latency during the process:
- Run Locust with a required endpoint (for example,
Process
) and with the increasing loads (number of 'applications'): for example, 1, 4, 7, 10 applications. - For every run, collect the latency and RPS results and then chart them. The goal is to find the optimal value for latency/RPS, which could be visible as a peak in a chart.
See the characteristic graph that will be obtained as a result of measurements, showing a request queue on Regula Document Reader SDK after a certain number of 'applications' loads:
-
Zone 1: The area where the system is underloaded, allowing an increase in incoming requests without decreasing the response time.
-
Zone 2: The area where the system is overloaded, causing the response time to increase as requests wait in the queue.
-
Zone 3: The area where different workers compete for resources, leading to a gradual increase in response time. This is the optimal zone for selecting operational parameters.
If your business requires more simultaneous 'applications' or a higher RPS number, refer to the multi-worker configuration described in the next section. If the latency does not meet business requirements, choose a higher-performance instance.
Scaling Up Optimization
To define the optimal workload when scaling up, measure the performance of a single worker, and then define the desired worker number based on the formula given in Environment Recommendations:
Worker count = target throughput × latency
Next, using the memory requirements, you can estimate the number of workers that can run on a single instance, see Vertical Scaling. However, be aware that since multiple workers will share the computational resources of the same machine, the overall performance may be less than optimal.
We recommend conducting a performance test on a single machine with the chosen number of workers. The methodology is similar to that described in the previous section. If computational resources are sufficient, the performance graph will resemble the one shown in the previous section. If resources are insufficient, increasing the number of incoming requests will result in a decrease in RPS and an increase in latency, see the Zone 1 in the graph below:
To identify what is causing the bottleneck, use a monitoring system. If computational resources are insufficient, consider either reducing the number of workers or selecting a machine with better performance to achieve the best results.