Hi Team
I am running a simple bool term query on an index in ES. It took 33152ms to respond.
Request -
POST http://elastic_server_dns/index_name/_doc/_search
Body
{
"query": {
"bool": {
"must": [
{
"term": {
"orderNo":"abcdeffjfnjd"
}
}
]
}
}
}
Attaching the response I got -
The first request took a whooping 33s, to search from a particular index.
The same request subsequently, taking 2-3 ms max. Why is this happening? Is there something wrong with my config?
A basic idea of my config
-
10 data nodes, 512 GB disk and 6GB ram, with 50% heap allocated to java process
-
3 master eligible nodes
-
Date based index, the above request was made on last day index
-
Trying same requests on different index, taking much less time (5ms max)
-
Disk space usage is around 6-7% as of now in each data node
-
This is happening very intermittently, because while tying the question, now I am not getting this issue.
-
The throughput of system using this is as of now is - 100 Requests per minute, which is way too less for ES
-
Output for free -m in one of the data node and its similar in all other data nodes
total used free shared buff/cache available Mem: 6946 3943 268 0 2734 2635
Please let me know data points required.
Update -
I have added the timeout at my ingress level as 60s. after 60s I am getting exceptions like -
2019/08/27 13:26:02 [error] 39#39: *633505 upstream timed out (110: Connection timed out)
while reading response header from upstream, client: xxx.xxx.xxx.xx, server: _, request: "POST
/prefix-2019-08-26/_doc/_search HTTP/1.1", upstream: "http://xx.xxx.xx.x:9200/prefix-2019-08-26/_doc/_search", host: "xx.xx.xxx.x"
which means some time request is even taking more than 60s.
How can I debug the issue, I have set the ES data nodes and master node at log level debug, but no useful information I am getting there.
Update on issue
I monitored the cluster correctly, almost all the data nodes have very high CPU usage (99%-100% on 7-8 nodes out of 10), and that is the reason for slowness. below is screenshot of one the node
Does smaller RAM machine can be the reason for this -
6GB RAM, out of which 49% allocated for ES java process, and we can see heap is almost full in all the nodes (TOP output).