Troubleshooting
Please review this page carefully before contacting our support team. You can find more information on the FAQ page.
If you haven't found a solution and need technical support, submit a request and describe the issue as detailed as possible including the steps you took to solve or reproduce it. Also please provide the log file, which can be found in the Web API installation directory. This will help us to find and fix the issue. Thank you!
Bad license
-
Check if the regula.license file is in the expected location. The absolute path should be as follows:
``` /app/extBin/unix/regula.license ```
-
Check if the container has access to the internet, at least to the https://lic.regulaforensics.com/ URL. In case a direct internet connection is not an option, please configure Docker to use the proxy server.
- Check if the license type is "On-line" (supplied within the OLXXXXX.zip archive, not the SLXXXXX.zip one)
- Check if the license is sufficient to request the needed number of workers. Each license is issued for a specific number of workers; thus, it is not possible to run the container if the number of requested workers exceeds the number defined in the license.
- Contact us to check whether the license has expired or been deactivated.
Liveness/Readiness Probes
You can leverage the /api/ping endpoint for monitoring the container health and checking the availability of the container.
To check the license status, use the api/readiness endpoint.
Make sure to pick the interval between probes responsibly. A too low interval can flood the workers and affect the processing time. A too high interval can delay the identification of a malfunctioning container.
Permanent container restart
- Check if the allocated RAM is sufficient (at least 3Gb RAM per worker is required).
- Check if the container has access to the internet, at least the https://lic.regulaforensics.com/ URL. In case a direct internet connection is not an option, please configure Docker to use the proxy server.
Failing health check
Configuring applications health check is not a trivial topic. Our web service behaviour makes this topic even more nuanced. From the one side, it's simple as querying one HTTP endpoint http://localhost:41101/api/ping
, which produces simple json output:
{
"app-name": "Regula Face Recognition Web API",
"license-id": "00000000-0000-0000-0000-000000000000",
"license-serial": "OL00000",
"server-time": "2021-06-28 09:16:00.453891+00:00",
"valid-until": "2022-12-31T00:00:00Z",
"version": "3.1.304.273"
}
However, there are a few factors that can cause a group of sequential checks to fail. One of the most faced issues of our customers is overload on peak times.
Consider the following scenario:
- A web server with one worker under load balancer
- A web server backlog is 20 requests, the load balancer performs health check every 30 sec with timeout of 30 sec
- 20 requests come in, 3 of them contain a bad image, that will extend processing to 5 sec
- The health check request comes in and gets queued (as 21st request)
- To process the first 20 requests, the server needs
17 x 1 + 3 x 5=32
sec - After a while, the health check caller (load balancer) times out
- The load balancer thinks your application is broken, marks the instance broken, and terminates it or stops routing requests
That can happen for any number of web servers under a load balancer. A host with an overload is terminated, giving other hosts more requests, causing them to timeout health checks and get released too.
To fix that, we can use the following options (or a combination of them):
- Increase the health check timeout to 60s or even higher. Thus, we trade off time to discover really stacked nodes. If a web server on a given node crashes, the health check fails fast with a connection error.
- Decrease the backlog size to 10 requests and let requests from a load balancer fail, triggering the load balancer to shift traffic to another instance earlier. In general, use the
health check timeout / 3
empirical formula to determine the desired backlog size. - Increase the number of consecutive health check fails required to trigger the load balancer to remove a node from routing.
- Increase the health check period to 60s or even higher. Thus, if spike in load is not constant, the server has more time to free the backlog before the queue health check.