Index with few shard or index with many shards?

kamal · December 24, 2018, 8:50am

Hi
I have 3 nodes and 55TB data to index, I can split it to 10 index, each one contain 155 shard or 110 index each contain 10 shards.
I don't know which one is the best?
Can any one help?

Christian_Dahlqvist · December 24, 2018, 9:04am

What is the use case? What kind of data do you have? How are you going to query/use it?

kamal · December 24, 2018, 11:01am

The data is log, most of the fields are structured, just need to search on one field, but I want to use kibana to draw different visualizations.
Each record is max 500 bytes.

Christian_Dahlqvist · December 24, 2018, 11:21am

In that case the recommended best practice is to use time-based indices. Make sure you follow these guidelines on shard sizes and sharding practices. The following resources may also be useful:

https://www.elastic.co/webinars/optimizing-storage-efficiency-in-elasticsearch

https://www.elastic.co/webinars/using-rally-to-get-your-elasticsearch-cluster-size-right

https://www.elastic.co/elasticon/conf/2016/sf/quantitative-cluster-sizing

Having said that, having only 3 nodes for 55TB of raw data sounds a bit small, especially if you intend to have replicas in order to get high availability. I would however recommend running some tests to see how much data you can hold on your particular hardware.

kamal · December 24, 2018, 1:02pm

Each node has 40 cores, 128GB ram and 12 hdd (6TB) which are raid 10 in three arrays.
I read the documents, but still can't decide which one is better.

Christian_Dahlqvist · December 24, 2018, 1:18pm

What is not clear? Which options are you considering? Unsure how to apply time-based indices?

kamal · December 24, 2018, 1:50pm

Sorry to ask again, I am not professional in that (although I read all elastic docs).
I don't know which way to go:

Many indexes, each index few shards.
Few indexes, each index many shards.
In both architectures, each shard size is at max 30GB.
Thanks

Christian_Dahlqvist · December 24, 2018, 1:54pm

It depends on your data. How many different types of data? For each type, how much data do you have per day? What is the total time period covered by this data set?

kamal · December 24, 2018, 2:05pm

All data are same type, belongs to 24hours, after indexing, new data will not append.

Christian_Dahlqvist · December 24, 2018, 4:30pm

Then time-based indices may not be applicable. Try to align the indices with how you query the data. The feeer shards you need to query the better performance I would expect. If you are always going to query the full data set the total shard count may be more important that exactly how these divide into indices.

system · January 21, 2019, 4:30pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Correct number of shards for 5.3 TB indices Elasticsearch	10	2166	May 18, 2017
Trying to optimize Elasticsearch cluster Elasticsearch	3	976	February 20, 2017
Too big a shard vs Too many shards Elasticsearch	7	37402	March 22, 2017
Indices size and # of shard Elasticsearch	10	815	February 14, 2019
How should I configure the number of node, shard and replica? Elasticsearch	18	996	March 11, 2021

Index with few shard or index with many shards?

Related topics