Skip to content

Deployment

Proxy Guard

Document Reader SDK is not a general HTTP web server that handles hundreds of requests per second. Typical document processing takes up to a few seconds, so we can call it CPU intensive. Considering that a typical instance has 1-4 workers, we need to carefully manage workers' time.

One of the main sources of wasting workers' processing time is a slow clients problem. When the web service receives a request from a client with a slow internet connection, this request goes to a free worker. The worker will be bottlenecked by the speed of the client connection, and it will be blocked until the slow client finishes sending a large ID image. Being blocked means that this worker process can't handle any other request in the meantime, it’s just there, idle, waiting to receive the entire request, so it can start really processing it.

We strongly recommend using the Document Reader behind a proxy server. Although there are many HTTP proxies available, we strongly advise that you use Nginx. If you choose another proxy server, make sure that it buffers slow clients.

Danger

Typical Load Balancers from cloud providers, such as ELB/ALB from AWS, do not buffer slow clients. So, you still need to use some proxy server between LB and the Document Reader.

Health Check and Requests Queue Size

Configuring applications health check is not a trivial topic. Our web service behaviour makes this topic even more nuanced. From the one side, it's simple as querying one HTTP endpoint http://localhost:8080/api/ping, which produces simple json output:

{
  "app-name": "Regula Document Reader Web API",
  "license-id": "00000000-0000-0000-0000-000000000000",
  "license-serial": "OL00000",
  "server-time": "2021-06-28 09:16:00.453891+00:00",
  "valid-until": "2022-12-31T00:00:00Z",
  "version": "5.6.128414.450"
}

However, there are a few factors that can cause a group of sequential checks to fail. One of the most faced issues of our customers is overload on peak times.

Consider the following scenario:

  1. A web server with one worker under load balancer
  2. A web server backlog is 20 requests, the load balancer performs health check every 30 sec with timeout of 30 sec
  3. 20 requests come in, 3 of them contain a bad image of a document, that will extend processing to 5 sec
  4. The health check request comes in and gets queued (as 21st request)
  5. To process the first 20 requests, the server needs 17 x 1 + 3 x 5=32 sec
  6. After a while, the health check caller (load balancer) times out
  7. The load balancer thinks your application is broken, marks the instance broken, and terminates it or stops routing requests

That can happen for any number of web servers under a load balancer. A host with an overload is terminated, giving other hosts more requests, causing them to timeout health checks and get released too.

To fix that, we can use the following options (or a combination of them):

  1. Increase the health check timeout to 60s or even higher. Thus, we trade off time to discover really stacked nodes. If a web server on a given node crashes, the health check fails fast with a connection error.
  2. Decrease the backlog size to 10 requests and let requests from a load balancer fail, triggering the load balancer to shift traffic to another instance earlier. In general, use the health check timeout / 3 empirical formula to determine the desired backlog size.
  3. Increase the number of consecutive health check fails required to trigger the load balancer to remove a node from routing.
  4. Increase the health check period to 60s or even higher. Thus, if spike in load is not constant, the server has more time to free the backlog before the queue health check.
Back to top