About distributed IDF in ES


(Thiago Souza) #1

Hi Shay,

 One of the main limitation of building a distributed Solr was that

there would be no distributed IDF, and so, the issue SOLR-1632 was raised to
address this limitation. How is this issue handled in ES?

Regards,
Thiago Souza


(Shay Banon) #2

Don't know about Solr in this aspect, but here is what elasticsearch has to
offer:
http://www.elasticsearch.com/docs/elasticsearch/rest_api/search/search_type/(check
the dfs_... options).

-shay.banon

On Mon, Sep 27, 2010 at 3:57 PM, Thiago Souza tcostasouza@gmail.com wrote:

Hi Shay,

 One of the main limitation of building a distributed Solr was that

there would be no distributed IDF, and so, the issue SOLR-1632 was raised to
address this limitation. How is this issue handled in ES?

Regards,
Thiago Souza


(jminard) #3

So, a double query to compute the adjustments to normalize IDF. It is
possible to generate an IDF specific lookup (i.e. index) that is used
locally in master nodes rather than fanning out a query. I guess it depends
on the latency of this 2-pass approach whether it is worth it to go for
something that manages an index-time list vs. doing it at query time.
Endeca for example has an aggregator index which contains global/distributed
IDF information for those master processes that start the query chain.

View this message in context: http://elasticsearch-users.115913.n3.nabble.com/About-distributed-IDF-in-ES-tp1589207p1860688.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.


(system) #4