Skip to content

Testing Framework

Performance testing is crucial to ensure that the Face SDK can handle the demands of real-world applications efficiently. By evaluating key performance metrics, we can identify the optimal configuration for various environments. This guide provides an overview of how to test and optimize the performance of the Face SDK, including setup instructions, measurement techniques, and result analysis.

Before You Start

Performance Metrics

Regula Face SDK is a client-server system with communication over HTTP. You can use the following metrics for system performance evaluation:

  • RPS (Requests Per Second or 'throughput') is the number of requests the system can handle per second.
  • Latency is the delay for the system to respond, ms.
  • Applications is the number of simultaneous applications the system can run.
  • Workers is the number of workers the current hardware can run.

Areas Under Test

The Face SDK provides multiple endpoints, each with different functionalities. Verify the endpoints relevant to your business needs. The full list of endpoints is available at Face SDK OpenAPI documentation.

The test framework provides functionality for assessing all endpoints.

Essential Testing Setup

Before starting performance testing, contact our sales team to ensure you have the necessary license.

  • Locust is used as a load tool.

    Note that Locust can be limited by the machine's outgoing bandwidth and CPU. For example, when tests involve sending images of several megabytes, the RPS may not exceed 10 even with 100 workers because the machine is fully loaded.

  • Locust sends requests to the Face SDK instance.

  • This instance contains Prometheus to collect system metrics, refer to the Monitoring documentation.
  • Grafana is used for results visualization.

We recommend to split the Face SDK and the Locust-based Test Load system to different machines to ensure that test load generation does not affect test results.

Splitting Face SDK and Locust-based Test Load system to different machines

For performance testing, you'll need to prepare a copy of the working test setup. Follow the guide on GitHub. Note that this task should be performed by a qualified specialist.

To make sure that network speed is not a bottleneck in your testing, you can use speedtest.net or similar tools.

Defining Optimal Workload

Performance depends on many parameters and varies from environment to environment. Therefore, for the best results, you need to measure your performance in real conditions.

Start with the simplest configuration of one worker to understand the characteristics of the specific machine used for further configuration design. Next, let's perform measurements and obtain optimal values for latency and RPS.

To detect the number of Face SDK operations a server can perform, let's find out the maximum value for the server. For this, we apply load to the server with different 'applications' and measure latency during the process:

  1. Run Locust with a required endpoint (for example, Liveness) with different loads (number of 'applications'): for example, 1, 4, 7, 10 applications.
  2. For every run, collect the latency and RPS results and then chart them. Our goal is to find the best value for latency/RPS, which could be visible as a peak in a chart.

Here is a characteristic graph that will be obtained as a result of measurements, showing a requests queue on Regula Face SDK after a certain number of 'applications' load:

Analysing workload

  • Zone 1: The area where the system is underloaded, allowing an increase in incoming requests without decreasing the response time.

  • Zone 2: The area where the system is overloaded, causing the response time to increase as requests wait in the queue.

  • Zone 3: The area where different workers compete for resources, leading to a gradual increase in response time. This is the optimal zone for selecting operational parameters.

If your business needs more simultaneous 'applications' or a higher RPS number, refer to the multi-worker configuration described in the next section. If the latency does not meet business requirements, choose a higher-performance instance.

Optimal Workload When Scaling Up

To define the optimal workload when scaling up, measure the performance of a single worker, and then define the desired worker number based on the formula given in Environment Recommendations:

Worker count = target throughput × latency

Next, using the memory requirements, you can estimate the number of workers that can run on a single instance, see Vertical Scaling. However, be aware that since multiple workers will share the computational resources of the same machine, the overall performance may be less than optimal. Additionally, for GPU usage, the CPU might become a bottleneck if it cannot handle the required number of threads.

We recommend conducting a performance test on a single machine with the chosen number of workers. The methodology is similar to that described in the previous section. If computational resources are sufficient, the performance graph will resemble the one shown in the previous section. If resources are insufficient, increasing the number of incoming requests will result in a decrease in RPS and an increase in latency, see Zone 1 in the graph below:

Defining the number of workers

To identify what is causing the bottleneck, use a monitoring system. If computational resources are insufficient, consider either reducing the number of workers or selecting a machine with better performance to achieve the best results.

How to Collect Results

To evaluate the testing results, use the Grafana Face API Prometheus and Locust Prometheus Monitoring dashboards:

Dashboards

Use the Face API Prometheus Dashboard to verify Regula Face SDK server load. The goal is to ensure that the Face SDK server is not limited by system resources. Verify that memory and CPU are not at their limits. If system resources are insufficient, reduce the load or change the server.

GPU CPU GPU CPU

The Locust Prometheus Monitoring dashboard represents the load progress made by the Locust load system. You can verify the current RPS, the count of load requests done, and the average response time. Assess the results according to your business needs. If performance is insufficient, review the Face API Prometheus Dashboard or reduce the load.

Locust Prometheus Monitoring Average Response Time