Shards too large to archive data

weibin.wu · April 7, 2017, 3:54am

Hi Elasticsearch:

When I start to take care one of our cluster. I found this cluster is too hugh.
Shards: 5
Nodes: 5
Replica: 1
Each shard 400GB data.
Total documents: 5 billions.
Version: 1.4

I tried to do a simple search on the cluster but the query just run forever until timeout.
I tried elasticdump from github but also got stuck when running. The search latency in my monitoring is bumping high.
The data is time based, is there a way to archive the data? Any ideas?

Christian_Dahlqvist · April 7, 2017, 5:35am

Query and aggregation latency will depend on the shard size as each query/aggregation runs single threaded over each shard. Multiple shards and queries can however be processed in parallel. Having this large shards can therefore result in poor query performance.

If you have time-based data, we generally recommend that you use time-based indices.

As you are on a very old version, I would also recommend upgrading.

Having said that, I don't think there is any easy was out, and you will need to reindex your data into new indices. If even simple queries and scroll requests time out that may however be difficult.

weibin.wu · April 7, 2017, 10:30am

Thanks Christian:

I am using filter to extract the data monthly and seems its works for my case.
I am going to build new index based on month. In my calculation if I build index based on month.
1 index has 5 shards then each shard will have 60GB data about 5 millions documents.
Do you think this shard is too big ? What is the recommendation for a shards size?

Christian_Dahlqvist · April 7, 2017, 10:47am

The ideal shard size depends on the use case, but we generally recommend keeping it below 50GB. You do not have to go with 5 shards per index, and if they will get too large, maybe 8 or 10 shards per monthly index may be more suitable.

weibin.wu · April 7, 2017, 10:50am

Thanks Christian. That will solve my case for now.

Christian_Dahlqvist · April 7, 2017, 10:50am

Also consider upgrading, at least to Elasticsearch 1.7.

system · May 5, 2017, 10:59am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Correct number of shards for 5.3 TB indices Elasticsearch	10	2152	May 18, 2017
Large shard size Elasticsearch	4	399	December 4, 2021
Trying to optimize Elasticsearch cluster Elasticsearch	3	964	February 20, 2017
Max shard size for a very large single index Elasticsearch	5	1706	April 7, 2020
Too big a shard vs Too many shards Elasticsearch	7	37227	March 22, 2017

Shards too large to archive data

Related topics