Bulk Indexing performance 10 indices of 1 shard vs 1 index of 10 shards

jakesjohn · July 24, 2017, 11:43pm

From docs, i read that query performance remains same if queried on 10 indices of 1 shard Vs 1 index of 10 shards.

What about bulk indexing performance for the same situation: Bulk indexing on 10 indices of 1 shard Vs 1 index of 10 shards?

I have continuous incoming of logs into the system throughout the day. Queries have to be supported on whole data(for a year) but it will be less frequent. i am planning to use roll over api (https://www.elastic.co/guide/en/elasticsearch/reference/master/indices-rollover-index.html) to roll over index when it has become large.

With roll over api, It is a tradeoff to figure out

How much data to be kept in a single index vs rolling over to new index when index becomes large?

3..Also, how many shards should we keep in an index even if we configure roll over? Keeping whole data in a single shard and roll over vs keeping data in 5 shards(eg) and roll over

I would like to get recommendations as baseline to start my tests.

warkolm · July 24, 2017, 11:58pm

More shards usually means faster writes (unless you go crazy).

Given you are using rollover you can also use _shrink, so you can start with more shards and then reduce for longer term storage.

jakesjohn · July 26, 2017, 10:49pm

I have high incoming rate and concurrent searches over entire data(hot and cold). I am not concerned much about the search latency but I have to ensure cluster stability(even during extreme searches) and high indexing rate. As you suggested, I will have an index with 5 shards and index upto 500 GB(approx 100 GB in each shard) in hot nodes. Later when i roll over to cold nodes, Should i reduce the total number of shards to a single shard containing 500GB? Does it have any dependency over system RAM? Each of my machine is 8 core cpu and 64GB RAM with 2TB SSD. What are the recommendations and implications?

warkolm · July 26, 2017, 11:05pm

We usually recommend no larger than 50GB a shard and I would suggest you use that as the max on the hot nodes.

Once you move it to cold then you can increase that size when you shrink. You'll have to test what size is best for you though.

The only limitation is that you cannot have more than 2 billion docs in a single shard.

jakesjohn · July 26, 2017, 11:55pm

Thanks for your reply. Is 50 GB shard size recommendation limit only for actively indexed shards? I got confused when you said that it is not a problem in cold nodes. Have you seen this shards of this huge number before? Can you please tell what all could be affected when having huge shards?

warkolm · July 27, 2017, 12:07am

It's a general guide so it applies to any shard. The things that you need to balance here are;

Search response times. Larger shards can slow down search.
Recovery/reallocation times. If you have to move multiple 100GB+ shards around, that takes longer than more smaller shards.

system · August 24, 2017, 12:07am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Bulk indexing and number of shards Elasticsearch	5	732	July 6, 2017
Dynamic growing, a solution for a fixed shard number? Elasticsearch	2	1819	July 6, 2017
Shard size / Index number / server count and performance Elasticsearch	4	1409	July 6, 2017
Many small indices vs one large index? Elasticsearch	10	5459	July 6, 2017
Single index has multi shard or multi index has single shard Elasticsearch	8	2232	November 2, 2017

Bulk Indexing performance 10 indices of 1 shard vs 1 index of 10 shards

Related topics