The config on my nodes is:
- Running version 5.3.0 of ES,Logstash, and Kibana
- 8 CPUs
- 250gig SSD drives
- They have 40 Gigs of RAM
- I allocated 16 gigs to the heap
- 5 ES nodes
- Index configured for 5 shards, 1 replica
- avg shard size for the index is about 20 gigs (a little over 20,000,000 docs)
- avg CPU utilization on the nodes doesn't increase past 10%
- Just searching in the Kibana discover page without any visualizations or aggs
- Searches for the past few hours are very quick but searching back more than 12 hours takes almost a minute, 24 hours can take longer than 90 secs
The bulk thread pool queue size on my nodes hovers around 1-2 consistently and I see no other queue types build up.
I see no rejections.
Here is a chart representing my heap metrics (16 gigs allocated heap, the green line) and heap being used (the blue line):
All my ES nodes exhibit this same pattern.
The heap in use rises to about 10.84 gigs before dropping off. From what I understand this is a health pattern for this metric. It has 16 gigs and GC starts when the heap in use hits 75% (which would be 12).
So why are my searches still slow :(. When I search back 12 hours or more it churns and occasionally kibana times out (I currently have kibana set to 90 sec timeout).
I know this could also depend on my index itself, but I want to rule out any server config first because I like my index the way it is.
What other metrics should I investigate here? I can give them more memory but should I also allocate more to the heap or is that in a good place right now? From what I understand non-heap memory is used by lucene so that would improve search performance right?