I appreciate if I could get advice with number of indices.
I have 3 elasticsearch nodes with below spec for each node.
8 core
64 GB (30 GB heap)
48TB (RAID 1+0)
Our requirement is
60GB/day , with avg 500 Bytes per event.
40 types of servers and network devices
Each logs should hold about year
Ideally ,
I want to split the logs into 60 indices per day with 2 primary shard + 1 replica
Since each logs are different context resulting in different fields so each logs will be separate into 1 indice
The total indices will count up to 7,300 indices per node in a year.
Each indice will contain about 2Million docs.
However, reality is, since each indice will consume memory just by opening post . Is there any good way to keep all the indice open ? Perhaps, index alias will be a good idea in my caes?
That many indices and shards sounds like a very, very bad idea. If want to be able ton hold a lot of data on your nodes, you will in my experience need to have reasonably large average shard size, typically in the tens of GB in size.
I would recommend finding data that is similar in structure and put these in the same index and/or switch to monthly indices. If you have 60 monthly indices with 2 primary shards and 1 replica, you will instead generate 240 shards per month which gives 2880 shards per year, which sounds more reasonable.
3000 shards across 3 data nodes is still a lot, but could be manageable. Depending on your mappings and how much space data takes up on disk, you may very well need more than 3 nodes in the end.
Got it. Thanks. I will consider of reducing more number of shards, e.g setting shards/indice to 1 , or increasing number of nodes. Also, keeping the fields minimum as possible.
Also, currently , heap memory usage is the only way of tracking the memory usage per shard?
I do not think there is any way to exactly determine the amount of heap used per shard. I generally recommend having hundreds rather than thousands of shards per node for log analytics use cases where the nodes have ~30GB heap. The exact limit will depend on the use case though.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.