Signs of too few shards? Number of documents per shard?

Hi all,

I'm experiencing performance challenges with an ES cluster. The cluster consists of 6 data nodes (3 hot w/ ~30GB heap, 3 warm w/ ~15GB heap). The cluster holds around 4TB of data total, with a total of 20,000,000,000 (20 billion) docs. Indexes are 6 shards each, where each shard holds 25-35GB of data.

The overwhelming majority of the docs are very small, effectively key-value pairs. Those small docs are nested docs, but still indexed as documents in Lucene. I have read on numerous occasions that optimal shard size is "a few GB to a few tens of GB" or "less than but close to 50GB".

I see less info about the ideal number of documents per shard. I know that Lucene (i.e. a single shard) has an upper limit of 2 billion docs. But does the performance really not change regardless of the number of docs? My empirical findings would suggest that number of docs does in fact matter. I see that search threads consistently appear in hot threads, even if no indexing is happening. I can also see that memory usage and CPU usages are low (CPU @ 20%, mem @ 50%).

Could this indicate that for my use case, splitting the indexes into more, smaller shards might yield improved search latency? Under what circumstances might one assume that the number of shards in use is too small?

1 Like

Indeed having more smaller shards allows to parallelize search and get results faster, but only if a node has enough threads in the search thread pool. If the node's search threads are already fully and constantly occupied, having more shards will not help search.

1 Like

Thanks very much for the reply @mayya ! This is very valuable insight :pray:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.