Elasticsearch sharding

mbx · February 14, 2011, 2:04pm

Hi,
for my application i've tested Lucene, but quickly had problems with

10 million documents in one index on one server.
Query latency growed up to several seconds and for some keywords
result sets had been very large - up to out of memory.
Now i'm looking for a better solution.
During my research it seems that sharding is state of the art (Solr,
Elasticsearch) to work with large indices ( >20 Mio docs).

My Questions:
Why do i need sharding?
If i want to search over several shards (e.g. jan2010 upto dec2010)
i've to merge the results. Isn't it the same work than searching in a
large 2010-index?

How does sharding work for search? I think understood how the
documents are hashed and distributed to different shards but where are
they merged?

Sorry for the beginner questions.
Tank you!
-mbx

Karussell1 · February 14, 2011, 6:51pm

Results are merged when querying. there are different query types.
take a look into:

http://www.marcsturlese.com/2010/02/12/elasticsearch/

Hitting several indices instead of one allows you to use several CPUs
or to put the shards on different servers.

More on that subject should be explained by the author of ES (i'm
not an expert ...)

On 14 Feb., 15:04, mbx myze...@googlemail.com wrote:

Hi,
for my application i've tested Lucene, but quickly had problems with>10 million documents in one index on one server.

Query latency growed up to several seconds and for some keywords
result sets had been very large - up to out of memory.
Now i'm looking for a better solution.
During my research it seems that sharding is state of the art (Solr,
Elasticsearch) to work with large indices ( >20 Mio docs).

My Questions:
Why do i need sharding?
If i want to search over several shards (e.g. jan2010 upto dec2010)
i've to merge the results. Isn't it the same work than searching in a
large 2010-index?

How does sharding work for search? I think understood how the
documents are hashed and distributed to different shards but where are
they merged?

Sorry for the beginner questions.
Tank you!
-mbx

kimchy · February 16, 2011, 2:03am

Thats the gist of it. Spreading a the search over multiple shards (that exist on several machines) will mean searching over a smaller index (per shard), and then joining the results back.
On Monday, February 14, 2011 at 8:51 PM, Karussell wrote:

Results are merged when querying. there are different query types.
take a look into:

http://www.marcsturlese.com/2010/02/12/elasticsearch/

Hitting several indices instead of one allows you to use several CPUs
or to put the shards on different servers.

More on that subject should be explained by the author of ES (i'm
not an expert ...)

On 14 Feb., 15:04, mbx myze...@googlemail.com wrote:

Hi,
for my application i've tested Lucene, but quickly had problems with>10 million documents in one index on one server.

Query latency growed up to several seconds and for some keywords
result sets had been very large - up to out of memory.
Now i'm looking for a better solution.
During my research it seems that sharding is state of the art (Solr,
Elasticsearch) to work with large indices ( >20 Mio docs).

My Questions:
Why do i need sharding?
If i want to search over several shards (e.g. jan2010 upto dec2010)
i've to merge the results. Isn't it the same work than searching in a
large 2010-index?

How does sharding work for search? I think understood how the
documents are hashed and distributed to different shards but where are
they merged?

Sorry for the beginner questions.
Tank you!
-mbx

Topic		Replies	Views
Search Performance Elasticsearch	9	374	July 6, 2017
Very slow searches on a very large shard Elasticsearch	4	2108	February 23, 2018
When is one shard enough Elasticsearch	6	1308	July 6, 2017
Is shard splitting supported in Elastic search, any alternate Elasticsearch	9	446	July 6, 2017
Heavy indexing cause severe delay for searching Elasticsearch	12	523	July 6, 2017

Elasticsearch sharding

Related topics