Dynamic growing, a solution for a fixed shard number?


(makovskij) #1

It's always a problem to set a number of shards for an index.
We know how much Data we get per Month (about 150 GB) and since a lot of people post a shard shouldn't be larger than 50 GB it would fit into 3 shards.

If we make 5 shards per index and 1 replica (= 10 shards) , it would perfect fit with our 10 data Nodes.

Now the idea:
We ask the current index if a criterion is reached (e.g. 50 GB per Shard in the index) and create a new index.
We would set a filtered alias for a user and with this alias we can seperate the searching against all the indices.

The other question with this idea: is there a difference between setting 1 shard per index or e.g. 5 shards?
Or is there a disadvantage to create many indices with one shard towards
a fifth number of all indices with 5 shards each?

What do you think and recommend?

regards
Andi


(Sarwar Bhuiyan) #2

I guess the answer is it depends on:

  1. Write ingestion speeds/volumes
  2. Read throughput (expected) and the types of queries
  3. Other factors?

You can have one primary shard per index but it's not the same as 5 shards as your data is being spread across multiple machines for writes.

The same goes for replica shards. How heavy are your reads and what sorts of queries would you be wanting to do?

It's worth trying things out on one node, one shard and getting some measurements determined according to the use cases. Have a look at https://www.elastic.co/guide/en/elasticsearch/guide/current/capacity-planning.html and see if you can try some experiments to arrive at sensible numbers to start with.

There are no universal sweet points that will cover all use cases.

All the best.

Sarwar


(system) #3