How search works in a single Elasticsearch shard(Lucene index)?

datval · November 30, 2015, 12:07pm

I am trying to understand how many shards I will need for my Elasticsearch cluster. Only recommendation I found is that each shards shouldn't be larger than 30GB, but there is no explanation why. My main concern is search speed right now.

Is a single search for a single Elasticsearch shard uses all CPU cores to rank results or it is a single threaded? I wonder which will be faster to have a single shard with 300GB of data or 10 shards with 30GB each? Lets assume I have 16 cores.

How about sorting and filtering? Are operations on Fielddata performed in parallel?

nik9000 · November 30, 2015, 2:05pm

Its a useful rule of thumb though its not really required. The size recommendation is because:

Large shards take a long time to relocate or recover.
A shard can contain no more than around two billion documents.

One shard executes on one thread in general.[quote="datval, post:1, topic:35892"]
I wonder which will be faster to have a single shard with 300GB of data or 10 shards with 30GB each?
[/quote]

Best is 10 30GB shards on ten machines. Beyond that its worth experimenting depending on your workload. Are you issuing one search at a time? Maybe 2 or 3 shards per machine is better. That depends on lots of stuff though, like how selective your queries are. Usually fewer shards are better, In the use case where you have lots of simultaneous searches then 1 shard per node is the best you are going to get.

Indexing, btw, is a highly parallelized process.

datval · November 30, 2015, 2:18pm

Thanks Nik,

Unfortunately we have limited number of machines, so wanted to get stats for a single node and then generalize. In most cases there will be no parallel queries, but some queries are taking more than 30 second and thinking if I can improve it without additional hardware. Your comment on one shard = one thread is very helpful in this terms. I ran some tests and couldn't really get any meaningful results(Tried term queries and sorting as it seems very expensive operation). My prediction was that when number of shards equals number of cores, I would get the best search performance, but it was not a case.

Thanks for sharing this, it is very helpful.

Topic		Replies	Views
Optimizing single-node search performance Elasticsearch	4	857	December 16, 2022
Max Shards allowed per Index Elasticsearch	4	601	May 1, 2020
With multiple shards on the node, the queries for those shards have to be run serially？ Elasticsearch	4	446	July 5, 2017
Performance searching single index vs multiple indices Elasticsearch	9	18121	July 27, 2018
When do you need more then 1 shard? Elasticsearch	12	1851	July 6, 2017

How search works in a single Elasticsearch shard(Lucene index)?

Related topics