High number of simultaneous searches make the requests response time bigger


We have 3 warm zones with 27 indices, each with 2 shards. Each zone has up to 2.2vCPU and 8 GB RAM. Those contain billions of documents and each index has approximately 50 GB of data. We use default AWS on the ES cloud for hosting, so we can access settings only by APIs.

The data is indexed correctly and sorted, and after changes we reduced the time from seconds/minutes to milliseconds/max few seconds. So each search request is very fast individually. We use just searching, not indexing.

The problem begins when a high volume of search requests come in, when the traffic is more intense. So the issue is that handling concurrent requests at once is slow and the response times are up to 30 seconds and that affect the production environment. At those moments, the CPU and memory usages are not very high, I would say normal.

Also the settings on the warm zones looks like:

"os": {
                "refresh_interval_in_millis": 1000,
                "name": "Linux",
                "pretty_name": "CentOS Linux 7 (Core)",
                "arch": "amd64",
                "version": "4.15.0-1035-aws",
                "available_processors": 16,
                "allocated_processors": 2

Verifying thread pool settings, we have 1000 queue size for all but only size(of threads) 4. Do you think we need to increase the number of threads size in order to handle more searches at once? If yes, how can we change this setting? Increasing RAM memory could solve the issue? Or where we make the mistake?

Or if you have any other ideas for improvements, please don't hesitate to share.

Thank you!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.