Hi Team
I am running a simple bool term query on an index in ES. It took 33152ms to respond.
Request -
POST    http://elastic_server_dns/index_name/_doc/_search
Body
    {
    "query": {
        "bool": {
        "must": [
            
             {
                "term": {
                     "orderNo":"abcdeffjfnjd"
                }
              
            }
        ]
        }
    }
    }
Attaching the response I got -
The first request took a whooping 33s, to search from a particular index.
The same request subsequently, taking 2-3 ms max. Why is this happening? Is there something wrong with my config?
A basic idea of my config
- 
10 data nodes, 512 GB disk and 6GB ram, with 50% heap allocated to java process
 - 
3 master eligible nodes
 - 
Date based index, the above request was made on last day index
 - 
Trying same requests on different index, taking much less time (5ms max)
 - 
Disk space usage is around 6-7% as of now in each data node
 - 
This is happening very intermittently, because while tying the question, now I am not getting this issue.
 - 
The throughput of system using this is as of now is - 100 Requests per minute, which is way too less for ES
 - 
Output for free -m in one of the data node and its similar in all other data nodes
total used free shared buff/cache available Mem: 6946 3943 268 0 2734 2635 
Please let me know data points required.
Update -
I have added the timeout at my ingress level as 60s. after 60s I am getting exceptions like -
2019/08/27 13:26:02 [error] 39#39: *633505 upstream timed out (110: Connection timed out) 
while reading response header from upstream, client: xxx.xxx.xxx.xx, server: _, request: "POST 
/prefix-2019-08-26/_doc/_search HTTP/1.1", upstream: "http://xx.xxx.xx.x:9200/prefix-2019-08-26/_doc/_search", host: "xx.xx.xxx.x" 
which means some time request is even taking more than 60s.
How can I debug the issue, I have set the ES data nodes and master node at log level debug, but no useful information I am getting there.
Update on issue
I monitored the cluster correctly, almost all the data nodes have very high CPU usage (99%-100% on 7-8 nodes out of 10), and that is the reason for slowness. below is screenshot of one the node
Does smaller RAM machine can be the reason for this -
6GB RAM, out of which 49% allocated for ES java process, and we can see heap is almost full in all the nodes (TOP output).

