[Help!] Number of indexes and shards per node

Hello

I appreciate if I could get advice with number of indices.

I have 3 elasticsearch nodes with below spec for each node.

8 core
64 GB (30 GB heap)
48TB (RAID 1+0)

Our requirement is

  1. 60GB/day , with avg 500 Bytes per event.
  2. 40 types of servers and network devices
  3. Each logs should hold about year

Ideally ,

  1. I want to split the logs into 60 indices per day with 2 primary shard + 1 replica
  2. Since each logs are different context resulting in different fields so each logs will be separate into 1 indice
  3. The total indices will count up to 7,300 indices per node in a year.
  4. Each indice will contain about 2Million docs.

However, reality is, since each indice will consume memory just by opening post . Is there any good way to keep all the indice open ? Perhaps, index alias will be a good idea in my caes?

Do you really mean this???
You nearly need 18 lakh TB space for that

@Ravi_Shanker_Reddy

Actually no. My mistake. 60 GB /day is the correct traffic.

That many indices and shards sounds like a very, very bad idea. If want to be able ton hold a lot of data on your nodes, you will in my experience need to have reasonably large average shard size, typically in the tens of GB in size.

I would recommend finding data that is similar in structure and put these in the same index and/or switch to monthly indices. If you have 60 monthly indices with 2 primary shards and 1 replica, you will instead generate 240 shards per month which gives 2880 shards per year, which sounds more reasonable.

@Christian_Dahlqvist

Thank you for the reply. You saved my day!

So, I know it depends on the field structure per document but I understand that keeping shard numbers around **2,500 - 3,000 ** is starting point.

Is there any way to check the heap memory usage per shard level? Perhaps,
heap_used_in_bytes/number of shards work as a rough number?

 "jvm" : {
        "timestamp" : 1491448005386,
        "uptime_in_millis" : 14338678,
        "mem" : {
          "heap_used_in_bytes" : 5811370816,

3000 shards across 3 data nodes is still a lot, but could be manageable. Depending on your mappings and how much space data takes up on disk, you may very well need more than 3 nodes in the end.

@Christian_Dahlqvist

Got it. Thanks. I will consider of reducing more number of shards, e.g setting shards/indice to 1 , or increasing number of nodes. Also, keeping the fields minimum as possible.

Also, currently , heap memory usage is the only way of tracking the memory usage per shard?

I do not think there is any way to exactly determine the amount of heap used per shard. I generally recommend having hundreds rather than thousands of shards per node for log analytics use cases where the nodes have ~30GB heap. The exact limit will depend on the use case though.

@Christian_Dahlqvist

Got it ! Thanks. We decided to first start with below and see how it goes.

  • 61 indices , 2 pri shard, 1 rep shard.
  • Monthly data retention with curator

Total shards per year = 61 indices * 4 shards * 12 month =
2,928 shards / cluster
976 shards / node

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.