How search works in a single Elasticsearch shard(Lucene index)?

I am trying to understand how many shards I will need for my Elasticsearch cluster. Only recommendation I found is that each shards shouldn't be larger than 30GB, but there is no explanation why. My main concern is search speed right now.

Is a single search for a single Elasticsearch shard uses all CPU cores to rank results or it is a single threaded? I wonder which will be faster to have a single shard with 300GB of data or 10 shards with 30GB each? Lets assume I have 16 cores.

How about sorting and filtering? Are operations on Fielddata performed in parallel?

Its a useful rule of thumb though its not really required. The size recommendation is because:

  1. Large shards take a long time to relocate or recover.
  2. A shard can contain no more than around two billion documents.

One shard executes on one thread in general.[quote="datval, post:1, topic:35892"]
I wonder which will be faster to have a single shard with 300GB of data or 10 shards with 30GB each?

Best is 10 30GB shards on ten machines. Beyond that its worth experimenting depending on your workload. Are you issuing one search at a time? Maybe 2 or 3 shards per machine is better. That depends on lots of stuff though, like how selective your queries are. Usually fewer shards are better, In the use case where you have lots of simultaneous searches then 1 shard per node is the best you are going to get.

Indexing, btw, is a highly parallelized process.

Thanks Nik,

Unfortunately we have limited number of machines, so wanted to get stats for a single node and then generalize. In most cases there will be no parallel queries, but some queries are taking more than 30 second and thinking if I can improve it without additional hardware. Your comment on one shard = one thread is very helpful in this terms. I ran some tests and couldn't really get any meaningful results(Tried term queries and sorting as it seems very expensive operation). My prediction was that when number of shards equals number of cores, I would get the best search performance, but it was not a case.

Thanks for sharing this, it is very helpful.