Hi,
for my application i've tested Lucene, but quickly had problems with
10 million documents in one index on one server.
Query latency growed up to several seconds and for some keywords
result sets had been very large - up to out of memory.
Now i'm looking for a better solution.
During my research it seems that sharding is state of the art (Solr,
Elasticsearch) to work with large indices ( >20 Mio docs).
My Questions:
Why do i need sharding?
If i want to search over several shards (e.g. jan2010 upto dec2010)
i've to merge the results. Isn't it the same work than searching in a
large 2010-index?
How does sharding work for search? I think understood how the
documents are hashed and distributed to different shards but where are
they merged?
Hi,
for my application i've tested Lucene, but quickly had problems with>10 million documents in one index on one server.
Query latency growed up to several seconds and for some keywords
result sets had been very large - up to out of memory.
Now i'm looking for a better solution.
During my research it seems that sharding is state of the art (Solr,
Elasticsearch) to work with large indices ( >20 Mio docs).
My Questions:
Why do i need sharding?
If i want to search over several shards (e.g. jan2010 upto dec2010)
i've to merge the results. Isn't it the same work than searching in a
large 2010-index?
How does sharding work for search? I think understood how the
documents are hashed and distributed to different shards but where are
they merged?
Thats the gist of it. Spreading a the search over multiple shards (that exist on several machines) will mean searching over a smaller index (per shard), and then joining the results back.
On Monday, February 14, 2011 at 8:51 PM, Karussell wrote:
Results are merged when querying. there are different query types.
take a look into:
Hi,
for my application i've tested Lucene, but quickly had problems with>10 million documents in one index on one server.
Query latency growed up to several seconds and for some keywords
result sets had been very large - up to out of memory.
Now i'm looking for a better solution.
During my research it seems that sharding is state of the art (Solr,
Elasticsearch) to work with large indices ( >20 Mio docs).
My Questions:
Why do i need sharding?
If i want to search over several shards (e.g. jan2010 upto dec2010)
i've to merge the results. Isn't it the same work than searching in a
large 2010-index?
How does sharding work for search? I think understood how the
documents are hashed and distributed to different shards but where are
they merged?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.