What is the path to utilize BM25F with Elastic Search

conor · May 2, 2011, 2:08pm

Hi,

We are evaluating Elastic Search for a project. We require document
retrieval based on the BM25F algorithm.

This reference describe an integration of BM25/BM25F with Lucene:
http://nlp.uned.es/~jperezi/Lucene-BM25/

Can anyone comment on how/if this capability could be integrated with
Elastic Search?

Many thanks,

Conor

otisg · May 2, 2011, 9:11pm

Hi Conor,

I think you need to look another layer lower - to Lucene, where
support for things like BM25(F) needs to be added first

Some pointers, including 2 JIRA issues:

http://search-lucene.com/?q=bm25&fc_project=Lucene&fc_project=Solr

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/

On May 2, 10:08 am, Conor co...@solariat.com wrote:

Hi,

We are evaluating Elastic Search for a project. We require document
retrieval based on the BM25F algorithm.

This reference describe an integration of BM25/BM25F with Lucene:http://nlp.uned.es/~jperezi/Lucene-BM25/

Can anyone comment on how/if this capability could be integrated with
Elastic Search?

Many thanks,

Conor

conor · May 3, 2011, 12:59am

Thanks for the pointers Otis.

I understand the underlying Lucene dependency. This link I mentioned (http://nlp.uned.es/~jperezi/Lucene-BM25/) refers to an implementation of BM25(F) on top of Lucene. It provides a number of extensions to Lucene for Scorer, Query, Weight, and Similarity.

I think my question is better stated: supposing one had extensions for Lucene that implemented BM25(F), how would they be passed through to Elastic Search?

It seems like the main elements from the API (dsl) are there in terms of field level boosting (so we could have a weighted sum of field level rankings). But there would have to be a way to load the Lucene extensions.

If anyone can shed light on the path to pursue this, or if it has already been done, I would be much obliged.

With thanks,

Conor

Alberto_Paro_2 · May 3, 2011, 6:49am

If you read the two jira issues, you'll discover that the BM25f that you point is buggy.

You should read the lucene flexscore branch (GSOC) at the top of lucene trunk

I started updating ES to lucene flexscore trunk, but I stopped for lack of time.

The index format in lucene trunk often changes, so it's not safe for production

Sent from my iPhone

On 03/mag/2011, at 02:59, conor conor@solariat.com wrote:

Thanks for the pointers Otis.

I understand the underlying Lucene dependency. This link I mentioned
(http://nlp.uned.es/~jperezi/Lucene-BM25/) refers to an implementation of
BM25(F) on top of Lucene. It provides a number of extensions to Lucene for
Scorer, Query, Weight, and Similarity.

I think my question is better stated: supposing one had extensions for
Lucene that implemented BM25(F), how would they be passed through to Elastic
Search?

It seems like the main elements from the API (dsl) are there in terms of
field level boosting (so we could have a weighted sum of field level
rankings). But there would have to be a way to load the Lucene extensions.

If anyone can shed light on the path to pursue this, or if it has already
been done, I would be much obliged.

With thanks,

Conor

--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/What-is-the-path-to-utilize-BM25F-with-Elastic-Search-tp2889935p2892048.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

kimchy · May 3, 2011, 4:23pm

Just to answer the question how this can be integrated into ES (buggy or not), and just based on me scanning through the docs:

Custom similarity can be easily plugged into elasticsearch. (not documented, but when you get to it, I can point you to how to configure it).
Custom queries added to the query DSL can also be added. They will need to "know" how to be parsed, and then just use them. Check any of the query implementations in elasticsearch to see how its done, and you can write a plugin that adds your own query process to the IndexQueryParserModule.

This is very high level, if you decide to do it, we can delve into the details.
On Tuesday, May 3, 2011 at 9:49 AM, Alberto Paro wrote:

If you read the two jira issues, you'll discover that the BM25f that you point is buggy.

You should read the lucene flexscore branch (GSOC) at the top of lucene trunk

I started updating ES to lucene flexscore trunk, but I stopped for lack of time.

The index format in lucene trunk often changes, so it's not safe for production

Sent from my iPhone

On 03/mag/2011, at 02:59, conor conor@solariat.com wrote:

Thanks for the pointers Otis.

I understand the underlying Lucene dependency. This link I mentioned
(http://nlp.uned.es/~jperezi/Lucene-BM25/) refers to an implementation of
BM25(F) on top of Lucene. It provides a number of extensions to Lucene for
Scorer, Query, Weight, and Similarity.

I think my question is better stated: supposing one had extensions for
Lucene that implemented BM25(F), how would they be passed through to Elastic
Search?

It seems like the main elements from the API (dsl) are there in terms of
field level boosting (so we could have a weighted sum of field level
rankings). But there would have to be a way to load the Lucene extensions.

If anyone can shed light on the path to pursue this, or if it has already
been done, I would be much obliged.

With thanks,

Conor

--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/What-is-the-path-to-utilize-BM25F-with-Elastic-Search-tp2889935p2892048.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

conor · May 3, 2011, 11:27pm

Thanks Shay.

That addresses my question for now. Much obliged.

Conor

Topic		Replies	Views
Elasticseach: Default Similairty Algorithm and BM25 giving same results Elasticsearch	12	2387	November 14, 2018
Change default similarity to BM25 for all fields Elasticsearch	3	831	July 6, 2017
Solr SearchComponent-like functionality? Elasticsearch	9	538	July 6, 2017
How to combine default BM25 score of Elasticsearch and Dense Vectors similarity Elasticsearch	3	633	May 7, 2021
How to use my customer lucene query? Elasticsearch	2	355	July 6, 2017

What is the path to utilize BM25F with Elastic Search

Otis

Related topics