Determine shards count for several indices

Yuval · July 26, 2017, 2:05pm

Hey, Elastic newbie here.

TL;DR:
I have few indices which will store LOTS of data (50GB + ) compared to other indices which store much less (1-2GB tops).
All the indices share the same default configuration (5 shards, 1 replica) and I think it won't be optimal in the future.

My actual question is - How should I divide my resources and decide how many shards and replicas every index has?

I'm currently running an elastic cluster on 4 nodes (servers) and am using the default settings in basically everything.
I've had no problems so far, but I'm trying to look ahead and I've come to realize that my indices are probably configured incorrectly.
Each of my clusters has 32GB RAM, 16 cores processors and around 4TB HD.
I read a few articles online that recommend to overallocate shards in indices which are expected to hold lots of data (and when your hardware can afford it, and I think my hardware can), but I'm still not sure on how to determine these settings.

so again, my question is - How should I divide my resources and decide how many shards and replicas every index has?

Thanks in advance,
Yuval

warkolm · July 26, 2017, 8:00pm

What sort of data is it?

Yuval · July 27, 2017, 7:02am

The "large" index contains network traffic (nested data), the small index holds network which I consider 'unusual' by my criteria..

warkolm · July 27, 2017, 8:00am

So time based?
Use the _rollover API based on a size and then you don't need to worry so much.

Yuval · July 27, 2017, 12:21pm

I don't think you understood my question,
My concern is that my physical resources are not distributed correctly, because I THINK that the traffic index should have more 'searching power' than the smaller index which has much less data in it...

How should I determine which index needs how many shards?

warkolm · July 28, 2017, 1:42am

Yep I get that.
My idea is that you can just define a rollover pattern with N shards, then use that rollover API to create a new underlying index when you get to a predetermined shard size. This helps negate how many shards you need to satisfy your resources concerns because it'll create new indices when those resources have reached a point you've determined is the maximum, and the shards are spread across the cluster.

So you will have two rollover patterns;

Network traffic - that creates new indices (ie more shards) reasonably often to cater for the higher volumes, which means more resources for searching and indexing
Unusual traffic - will only create new indices when it needs to, keeping resource use limited

The key here is what is that maximum and that needs testing based on your data type(s), available resources and other requirements like indexing and search timing requirement.
We do recommend a maximum of 50GB per shard if that's of any assistance.

system · August 25, 2017, 1:42am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Appropriate Number of Indices/Shards for My Cluster Elasticsearch	2	706	November 20, 2019
A few questions optimizing our Elastic Stack Elasticsearch	6	376	July 11, 2018
Proper shard and replica settings Elasticsearch	6	111	August 5, 2024
Shards and replicas allocation in elasticsearch Elasticsearch	7	474	December 17, 2018
How to chose the number of shards and replica Elasticsearch	3	312	October 19, 2020

Determine shards count for several indices

Related topics