Performance Results
Below you can find the results of our performance testing on typical AWS server configurations for the Liveness and Match functionalities. You can use them as reference numbers.
The results describe the performance of a tested Kubernetes node configuration under load. Each test was performed with a single pod running on the tested node. Depending on the instance size, the pod was configured with the maximum number of Face SDK service workers that could fit within the available resources.
For system performance evaluation, the following metrics are used:
- Profile is the performance preset used for the measurement: Min Latency (the lowest observed latency for the instance) and High Throughput (the highest observed RPS for the instance).
- RPS (Requests Per Second or 'throughput') is the total throughput achieved by the tested instance configuration.
- Latency is the response time of the system, in ms.
- Virtual Users is the number of concurrent virtual users generating load during the test.
- Workers is the number of Face SDK service workers running inside the single tested pod.
AWS Instance Reference Results
Liveness
GPU
| Instance size | Profile | RPS | Latency | Virtual Users | Workers |
|---|---|---|---|---|---|
| g4dn.xlarge | Min Latency | 2.57 | 1553.74 | 4 | 4 |
| High Throughput | 3.74 | 3206.77 | 12 | 4 | |
| g4dn.2xlarge | Min Latency | 3.73 | 1610.37 | 6 | 6 |
| High Throughput | 5.16 | 3872.85 | 20 | 6 | |
| g5.2xlarge | Min Latency | 5.72 | 1398.17 | 8 | 8 |
| High Throughput | 9.74 | 3694.91 | 36 | 8 | |
| g6.2xlarge | Min Latency | 6.03 | 1326.87 | 8 | 8 |
| High Throughput | 11.19 | 3573.91 | 40 | 8 |
CPU
| Instance size | Profile | RPS | Latency | Virtual Users | Workers |
|---|---|---|---|---|---|
| m7a.xlarge | Min Latency | 0.41 | 7378.83 | 3 | 3 |
| High Throughput | 0.45 | 20051.08 | 9 | 3 | |
| m8i.2xlarge | Min Latency | 0.65 | 9271.91 | 6 | 6 |
| High Throughput | 0.81 | 22100.41 | 18 | 6 |
Match
GPU
| Instance size | Profile | RPS | Latency | Virtual Users | Workers |
|---|---|---|---|---|---|
| g4dn.xlarge | Min Latency | 5.60 | 714.07 | 4 | 4 |
| High Throughput | 10.39 | 1924.29 | 20 | 4 | |
| g4dn.2xlarge | Min Latency | 7.59 | 790.55 | 6 | 6 |
| High Throughput | 13.40 | 2986.05 | 40 | 6 | |
| g5.2xlarge | Min Latency | 11.88 | 673.31 | 8 | 8 |
| High Throughput | 25.34 | 1420.92 | 36 | 8 | |
| g6.2xlarge | Min Latency | 11.91 | 671.86 | 8 | 8 |
| High Throughput | 28.93 | 1382.45 | 40 | 8 |
CPU
| Instance size | Profile | RPS | Latency | Virtual Users | Workers |
|---|---|---|---|---|---|
| m7a.xlarge | Min Latency | 0.95 | 3167.81 | 3 | 3 |
| High Throughput | 1.05 | 8585.20 | 9 | 3 | |
| m8i.2xlarge | Min Latency | 1.51 | 3967.29 | 6 | 6 |
| High Throughput | 1.59 | 11352.79 | 18 | 6 |
High Performance Configurations
Below are reference performance results for a high-performance configuration. Use these numbers as a baseline when estimating system capacity and planning infrastructure.
Hardware configuration
- GPU: NVIDIA-RTX-PRO-6000-Blackwell-Max-Q-Workstation-Edition
- CPU: INTEL(R) XEON(R) GOLD 6554S, 90 cores
- RAM: 120 GB
Traffic profile
In this configuration, RPS is the total throughput achieved by the full tested system under the traffic profile described below.
- Continuous load with 50 virtual users, split 70/30 between Liveness and Match.
| Profile | RPS | Latency (Liveness) | Latency (Match) | Workers |
|---|---|---|---|---|
| High Throughput | 36 | 1575.94 | 889.5 | 28 |