Determine shards count for several indices


#1

Hey, Elastic newbie here.

TL;DR:
I have few indices which will store LOTS of data (50GB + ) compared to other indices which store much less (1-2GB tops).
All the indices share the same default configuration (5 shards, 1 replica) and I think it won't be optimal in the future.

  • My actual question is - How should I divide my resources and decide how many shards and replicas every index has?

I'm currently running an elastic cluster on 4 nodes (servers) and am using the default settings in basically everything.
I've had no problems so far, but I'm trying to look ahead and I've come to realize that my indices are probably configured incorrectly.
Each of my clusters has 32GB RAM, 16 cores processors and around 4TB HD.
I read a few articles online that recommend to overallocate shards in indices which are expected to hold lots of data (and when your hardware can afford it, and I think my hardware can), but I'm still not sure on how to determine these settings.

so again, my question is - How should I divide my resources and decide how many shards and replicas every index has?

Thanks in advance,
Yuval


(Mark Walkom) #2

What sort of data is it?


#3

The "large" index contains network traffic (nested data), the small index holds network which I consider 'unusual' by my criteria..


(Mark Walkom) #4

So time based?
Use the _rollover API based on a size and then you don't need to worry so much.


#5

I don't think you understood my question,
My concern is that my physical resources are not distributed correctly, because I THINK that the traffic index should have more 'searching power' than the smaller index which has much less data in it...

How should I determine which index needs how many shards?


(Mark Walkom) #6

Yep I get that.
My idea is that you can just define a rollover pattern with N shards, then use that rollover API to create a new underlying index when you get to a predetermined shard size. This helps negate how many shards you need to satisfy your resources concerns because it'll create new indices when those resources have reached a point you've determined is the maximum, and the shards are spread across the cluster.

So you will have two rollover patterns;

  1. Network traffic - that creates new indices (ie more shards) reasonably often to cater for the higher volumes, which means more resources for searching and indexing
  2. Unusual traffic - will only create new indices when it needs to, keeping resource use limited

The key here is what is that maximum and that needs testing based on your data type(s), available resources and other requirements like indexing and search timing requirement.
We do recommend a maximum of 50GB per shard if that's of any assistance.


(system) #7

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.