What is the path to utilize BM25F with Elastic Search

Hi,

We are evaluating Elastic Search for a project. We require document
retrieval based on the BM25F algorithm.

This reference describe an integration of BM25/BM25F with Lucene:
http://nlp.uned.es/~jperezi/Lucene-BM25/

Can anyone comment on how/if this capability could be integrated with
Elastic Search?

Many thanks,

Conor

Hi Conor,

I think you need to look another layer lower - to Lucene, where
support for things like BM25(F) needs to be added first

Some pointers, including 2 JIRA issues:

http://search-lucene.com/?q=bm25&fc_project=Lucene&fc_project=Solr

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/

On May 2, 10:08 am, Conor co...@solariat.com wrote:

Hi,

We are evaluating Elastic Search for a project. We require document
retrieval based on the BM25F algorithm.

This reference describe an integration of BM25/BM25F with Lucene:http://nlp.uned.es/~jperezi/Lucene-BM25/

Can anyone comment on how/if this capability could be integrated with
Elastic Search?

Many thanks,

Conor

Thanks for the pointers Otis.

I understand the underlying Lucene dependency. This link I mentioned (http://nlp.uned.es/~jperezi/Lucene-BM25/) refers to an implementation of BM25(F) on top of Lucene. It provides a number of extensions to Lucene for Scorer, Query, Weight, and Similarity.

I think my question is better stated: supposing one had extensions for Lucene that implemented BM25(F), how would they be passed through to Elastic Search?

It seems like the main elements from the API (dsl) are there in terms of field level boosting (so we could have a weighted sum of field level rankings). But there would have to be a way to load the Lucene extensions.

If anyone can shed light on the path to pursue this, or if it has already been done, I would be much obliged.

With thanks,

Conor

If you read the two jira issues, you'll discover that the BM25f that you point is buggy.

You should read the lucene flexscore branch (GSOC) at the top of lucene trunk

I started updating ES to lucene flexscore trunk, but I stopped for lack of time.

The index format in lucene trunk often changes, so it's not safe for production

Sent from my iPhone

On 03/mag/2011, at 02:59, conor conor@solariat.com wrote:

Thanks for the pointers Otis.

I understand the underlying Lucene dependency. This link I mentioned
(http://nlp.uned.es/~jperezi/Lucene-BM25/) refers to an implementation of
BM25(F) on top of Lucene. It provides a number of extensions to Lucene for
Scorer, Query, Weight, and Similarity.

I think my question is better stated: supposing one had extensions for
Lucene that implemented BM25(F), how would they be passed through to Elastic
Search?

It seems like the main elements from the API (dsl) are there in terms of
field level boosting (so we could have a weighted sum of field level
rankings). But there would have to be a way to load the Lucene extensions.

If anyone can shed light on the path to pursue this, or if it has already
been done, I would be much obliged.

With thanks,

Conor

--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/What-is-the-path-to-utilize-BM25F-with-Elastic-Search-tp2889935p2892048.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

Just to answer the question how this can be integrated into ES (buggy or not), and just based on me scanning through the docs:

  1. Custom similarity can be easily plugged into elasticsearch. (not documented, but when you get to it, I can point you to how to configure it).
  2. Custom queries added to the query DSL can also be added. They will need to "know" how to be parsed, and then just use them. Check any of the query implementations in elasticsearch to see how its done, and you can write a plugin that adds your own query process to the IndexQueryParserModule.

This is very high level, if you decide to do it, we can delve into the details.
On Tuesday, May 3, 2011 at 9:49 AM, Alberto Paro wrote:

If you read the two jira issues, you'll discover that the BM25f that you point is buggy.

You should read the lucene flexscore branch (GSOC) at the top of lucene trunk

I started updating ES to lucene flexscore trunk, but I stopped for lack of time.

The index format in lucene trunk often changes, so it's not safe for production

Sent from my iPhone

On 03/mag/2011, at 02:59, conor conor@solariat.com wrote:

Thanks for the pointers Otis.

I understand the underlying Lucene dependency. This link I mentioned
(http://nlp.uned.es/~jperezi/Lucene-BM25/) refers to an implementation of
BM25(F) on top of Lucene. It provides a number of extensions to Lucene for
Scorer, Query, Weight, and Similarity.

I think my question is better stated: supposing one had extensions for
Lucene that implemented BM25(F), how would they be passed through to Elastic
Search?

It seems like the main elements from the API (dsl) are there in terms of
field level boosting (so we could have a weighted sum of field level
rankings). But there would have to be a way to load the Lucene extensions.

If anyone can shed light on the path to pursue this, or if it has already
been done, I would be much obliged.

With thanks,

Conor

--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/What-is-the-path-to-utilize-BM25F-with-Elastic-Search-tp2889935p2892048.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

Thanks Shay.

That addresses my question for now. Much obliged.

Conor