Bulk Indexing performance 10 indices of 1 shard vs 1 index of 10 shards

From docs, i read that query performance remains same if queried on 10 indices of 1 shard Vs 1 index of 10 shards.

  1. What about bulk indexing performance for the same situation: Bulk indexing on 10 indices of 1 shard Vs 1 index of 10 shards?

I have continuous incoming of logs into the system throughout the day. Queries have to be supported on whole data(for a year) but it will be less frequent. i am planning to use roll over api (https://www.elastic.co/guide/en/elasticsearch/reference/master/indices-rollover-index.html) to roll over index when it has become large.

With roll over api, It is a tradeoff to figure out

  1. How much data to be kept in a single index vs rolling over to new index when index becomes large?

3..Also, how many shards should we keep in an index even if we configure roll over? Keeping whole data in a single shard and roll over vs keeping data in 5 shards(eg) and roll over

I would like to get recommendations as baseline to start my tests.

More shards usually means faster writes (unless you go crazy).

Given you are using rollover you can also use _shrink, so you can start with more shards and then reduce for longer term storage.

I have high incoming rate and concurrent searches over entire data(hot and cold). I am not concerned much about the search latency but I have to ensure cluster stability(even during extreme searches) and high indexing rate. As you suggested, I will have an index with 5 shards and index upto 500 GB(approx 100 GB in each shard) in hot nodes. Later when i roll over to cold nodes, Should i reduce the total number of shards to a single shard containing 500GB? Does it have any dependency over system RAM? Each of my machine is 8 core cpu and 64GB RAM with 2TB SSD. What are the recommendations and implications?

We usually recommend no larger than 50GB a shard and I would suggest you use that as the max on the hot nodes.

Once you move it to cold then you can increase that size when you shrink. You'll have to test what size is best for you though.

The only limitation is that you cannot have more than 2 billion docs in a single shard.

Thanks for your reply. Is 50 GB shard size recommendation limit only for actively indexed shards? I got confused when you said that it is not a problem in cold nodes. Have you seen this shards of this huge number before? Can you please tell what all could be affected when having huge shards?

It's a general guide so it applies to any shard. The things that you need to balance here are;

  • Search response times. Larger shards can slow down search.
  • Recovery/reallocation times. If you have to move multiple 100GB+ shards around, that takes longer than more smaller shards.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.