[Help!] Number of indexes and shards per node

YuWatanabe · April 7, 2017, 4:17am

Hello

I appreciate if I could get advice with number of indices.

I have 3 elasticsearch nodes with below spec for each node.

8 core
64 GB (30 GB heap)
48TB (RAID 1+0)

Our requirement is

60GB/day , with avg 500 Bytes per event.
40 types of servers and network devices
Each logs should hold about year

Ideally ,

I want to split the logs into 60 indices per day with 2 primary shard + 1 replica
Since each logs are different context resulting in different fields so each logs will be separate into 1 indice
The total indices will count up to 7,300 indices per node in a year.
Each indice will contain about 2Million docs.

However, reality is, since each indice will consume memory just by opening post . Is there any good way to keep all the indice open ? Perhaps, index alias will be a good idea in my caes?

Ravi_Shanker_Reddy · April 7, 2017, 4:52am

Do you really mean this???
You nearly need 18 lakh TB space for that

YuWatanabe · April 7, 2017, 4:57am

@Ravi_Shanker_Reddy

Actually no. My mistake. 60 GB /day is the correct traffic.

Christian_Dahlqvist · April 7, 2017, 5:16am

That many indices and shards sounds like a very, very bad idea. If want to be able ton hold a lot of data on your nodes, you will in my experience need to have reasonably large average shard size, typically in the tens of GB in size.

I would recommend finding data that is similar in structure and put these in the same index and/or switch to monthly indices. If you have 60 monthly indices with 2 primary shards and 1 replica, you will instead generate 240 shards per month which gives 2880 shards per year, which sounds more reasonable.

YuWatanabe · April 7, 2017, 5:32am

@Christian_Dahlqvist

Thank you for the reply. You saved my day!

So, I know it depends on the field structure per document but I understand that keeping shard numbers around **2,500 - 3,000 ** is starting point.

Is there any way to check the heap memory usage per shard level? Perhaps,
heap_used_in_bytes/number of shards work as a rough number?

 "jvm" : {
        "timestamp" : 1491448005386,
        "uptime_in_millis" : 14338678,
        "mem" : {
          "heap_used_in_bytes" : 5811370816,

Christian_Dahlqvist · April 7, 2017, 5:39am

3000 shards across 3 data nodes is still a lot, but could be manageable. Depending on your mappings and how much space data takes up on disk, you may very well need more than 3 nodes in the end.

YuWatanabe · April 7, 2017, 5:54am

@Christian_Dahlqvist

Got it. Thanks. I will consider of reducing more number of shards, e.g setting shards/indice to 1 , or increasing number of nodes. Also, keeping the fields minimum as possible.

Also, currently , heap memory usage is the only way of tracking the memory usage per shard?

Christian_Dahlqvist · April 7, 2017, 6:04am

I do not think there is any way to exactly determine the amount of heap used per shard. I generally recommend having hundreds rather than thousands of shards per node for log analytics use cases where the nodes have ~30GB heap. The exact limit will depend on the use case though.

YuWatanabe · April 7, 2017, 6:06am

@Christian_Dahlqvist

Got it ! Thanks. We decided to first start with below and see how it goes.

61 indices , 2 pri shard, 1 rep shard.
Monthly data retention with curator

Total shards per year = 61 indices * 4 shards * 12 month =
2,928 shards / cluster
976 shards / node

system · May 5, 2017, 6:16am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How many Shards / Replicas Elasticsearch	9	9940	July 5, 2017
Need advice on shards for my index Elasticsearch	15	944	September 30, 2020
When do you need more then 1 shard? Elasticsearch	12	1867	July 6, 2017
No. of Shards Per index in ES Cluster Elasticsearch	4	1237	July 5, 2017
Heap usage vs number of shards Elasticsearch	13	3190	November 19, 2017

[Help!] Number of indexes and shards per node

Related topics