for my application i've tested Lucene, but quickly had problems with
10 million documents in one index on one server.
Query latency growed up to several seconds and for some keywords
result sets had been very large - up to out of memory.
Now i'm looking for a better solution.
During my research it seems that sharding is state of the art (Solr,
Elasticsearch) to work with large indices ( >20 Mio docs).
Why do i need sharding?
If i want to search over several shards (e.g. jan2010 upto dec2010)
i've to merge the results. Isn't it the same work than searching in a
How does sharding work for search? I think understood how the
documents are hashed and distributed to different shards but where are
Sorry for the beginner questions.