Hi all,
the following situation is given:
ELK: 6.3
Cluster: 8 Data nodes, each of them with ~24TB dedicated Elastic datastore
Daily Data: ~250GB Primary Data, ~520.000.000 Documents
The indices are written daily, index template is currently set to 8 primary shards (31,5 GB per shard) with 3 replicas for search performance.
Current problem: If my calculations are correct, I can save the data for round about 192 days. But it is necessary to store them longer. Closing indices is not an option, as there could be the need to search sth. over e.g. a whole year. Furthermore, this kind of logs are the most, but not the only logs. Other logs are written too into another index prefix (but will be deleted after an amount of time). So there should be still a litte space for more data.
How would (or are) you handle the indices to increase the time, but also still be able to search over a bigger time range? If I make a fulltext search without fields over the last 24h, I am currently at ~40 seconds.
What is the specification of your cluster, hardware and storage?
8 Nodes, each with: 24 cores (HT), 64 GB Ram, 24 TB HDDs dedicated for elastic, 4 of them with logstash installed as indexer
How many queries are you serving per second given you have set up so many replicas?
Not many, we have dashboards and we sometimes make queries for special searches.
We use it for central network logs. I used so many replicas because of the slides of https://de.slideshare.net/swallez/black-friday-logs-scaling-elasticsearch (Slide nr. 57). I understood it like "the higher the count of data is, in which you are searching, the more replicas you should have for search speed). So, when I search in billions of documents, I need the replicas for performance(?). Number of shards based on index size, as one shard should not be bigger than 50 gb and one index is 250 GB.
What type of queries are run? What is the use case?
Dashboards, queries in discover tab. Searching in network logs
How much data do you currently have in the cluster?
I started right now, so there are not so much yet.
Total Shards: 298
Documents: 1,562,516,987
Data: 2.8 TB ( = 700 GB primary)
You typically scale out the number of replicas in order to handle more concurrent queries. If you have few concurrent queries I would probably recommend having only 1 replica. This will cut the amount of data in the cluster in half compared to the current configuration.
Ok thanks for the tip, reduced the replicas to 1. Anymore tips? Does it make sense to reindex (last month) daily indices to a big monthly index and delete the daylies?
If you want to put that much data on the nodes you will need to manage heap usage carefully, e.g. use large shards, forcemerge to minimize number of segments, optimise mapping to reduce heap usage and possibly also use coordinating nodes for querying to minimize the load on the data nodes.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.