I have a 5 server cluster running each with:
24 CPU Cores
64 GB RAM (30 for elastic the rest for file caches)
Lots of SSD disk (No spinners)
We have:
Nodes: 5 Indices: 455 Memory: 38.9 GB / 154.5 GB Total Shards: 4329 Unassigned Shards: 0 Documents: 2,745,342,366 Data: 2.0 TB
The system is being using as a logging system SIEM etc, it's been running for about 5 months now, but when I run searches across an index-pattern Kibana will often error with: "Discover: Gateway Timeout errors"
Now I know as the admin, that the search is running in the background, as usually if the user waits another 30+ seconds and reclicks 'search' in Kibana the results will be returned.
Is there anyway to optimize the end-user experience for slow running searches - instead of getting the error: "Discover: Gateway Timeout errors" in Kibana?
I would say that one of the problems you are facing is that you have far too many shards given the volume of data and size of cluster. Please read this blog post on shards and sharing practices and then try to reduce this significantly.
Interesting.
I have been reindexing our daily index patterns into monthly index patterns to consolidate shards.
We did have ~ 16,000 shards about a month ago....
2TB across 4329 shards is less than 500MB on average, which is very small. I would look to reduce the shard count by at least a factor of 10. As outlined in the blog post, an average shard size of a few tens of GB is quite reasonable and common.
If you have a long retention period you may also want to have a look at this webinar.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.