Correct sharding and indexes for Logging System


(Cam) #1

Hello All,

I was wondering if anyone can provide any guidance on the following points.

I have a 5 server cluster running each with:
24 CPU Cores
64 GB RAM (30 for elastic the rest for file caches)
Lots of SSD disk

The system is being using as a logging system SIEM etc, it's been running for about 4 months now, but I'm not sure about the sharding.

Already we have:
Nodes: 5 Indices: 1636 Memory: 65.9 GB / 154.5 GB Total Shards: 16165 Unassigned Shards: 0 Documents: 1,845,342,366 Data: 1.0 TB

I believe we are running the default sharding of 5 per index, across 7 index patterns, with each index-pattern generating a new index each day.

Question:
1). Is this sharding and index creation just plain wrong, given the use case (logging)
2). At what point do I require a dedicated master nodes? (at present they are just 5x data+master nodes)

Thanks


(Christian Dahlqvist) #2

Yes, that does not look good. You seen to have far too many indices and shards for a cluster that size. Please read this blog post on shards and sharding for practical guidance. I would recommend rethinking how you approach sharding and reduce this dramatically.

It often depends on the load of the system. If nodes are too busy and you start seeing long GC, master elections or cluster instability, it is probably time to add dedicated master nodes.


(Cam) #3

Also current document count for one index (as an example) is 17million
This will only grow perhaps by a factor of 10? as more systems are onboarded and send data...


(Christian Dahlqvist) #4

If you are looking to handle larger data volumes this webinar may also be useful:

https://www.elastic.co/webinars/optimizing-storage-efficiency-in-elasticsearch