To Interchange the Similarity Algorithm

Fabio_Figueiredo · August 27, 2013, 8:20pm

Hi people,

I've read in ElasticSearch docs [1] that an index is bounded to a
Similarity Algorithm (like BM25, DFR, TF/IDF or IB).

However, I'd like to know if would it be possible to specify a Similarity
Algorithm at runtime because I have some evidences that to interchange the
algorithm can bring better results if we use a machine learning technique
to weight the score of each Similarity Algorithm based on the
cirscunstances.

In other words, would it be possible, given a sample (ex.: first 500
documents returned by BM25), to choose and run other Similarity Algorithmsjust over those 500 documents instead of processing the whole index again?

[1] http://www.elasticsearch.org/guide/reference/index-modules/similarity/

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

simonw_2 · August 27, 2013, 8:48pm

you can configure similarity per field not per index. Yet, statistics for
certain algos are written at index time so you can't change it at runtime
you would need to index the field multiple times.

simon

On Tuesday, August 27, 2013 10:20:07 PM UTC+2, Fábio Figueiredo wrote:

Hi people,

I've read in Elasticsearch docs [1] that an index is bounded to a
Similarity Algorithm (like BM25, DFR, TF/IDF or IB).

However, I'd like to know if would it be possible to specify a Similarity
Algorithm at runtime because I have some evidences that to interchange the
algorithm can bring better results if we use a machine learning technique
to weight the score of each Similarity Algorithm based on the
cirscunstances.

In other words, would it be possible, given a sample (ex.: first 500
documents returned by BM25), to choose and run other Similarity Algorithm*
s* just over those 500 documents instead of processing the whole index
again?

[1] Elasticsearch Platform — Find real-time answers at scale | Elastic

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Israel_Ekpo · August 28, 2013, 1:40pm

Have you thought about putting the same documents in multiple indices (with
different similarity algorithms) and then figure out which algorithm works
best?

I am not sure how many documents you have but it is something to think
about.

Though some of a computation is done at query time, there is a good chunk
of preparatory work has to be done at index time for each algorithm so I
don't think you can change it on the fly like the way you are proposing.

Take a look at this documentation [1] to get a better understanding of the
inner workings of the Similarity feature

[1]
http://lucene.apache.org/core/4_4_0/core/org/apache/lucene/search/similarities/Similarity.html

[2]
http://lucene.apache.org/core/4_4_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html

Author and Instructor for the Upcoming Book and Lecture Series
Massive Log Data Aggregation, Processing, Searching and Visualization with
Open Source Software
http://massivelogdata.com

On 27 August 2013 16:48, simonw simon.willnauer@elasticsearch.com wrote:

you can configure similarity per field not per index. Yet, statistics for
certain algos are written at index time so you can't change it at runtime
you would need to index the field multiple times.

simon

On Tuesday, August 27, 2013 10:20:07 PM UTC+2, Fábio Figueiredo wrote:

Hi people,

I've read in Elasticsearch docs [1] that an index is bounded to a
Similarity Algorithm (like BM25, DFR, TF/IDF or IB).

However, I'd like to know if would it be possible to specify a Similarity
Algorithm at runtime because I have some evidences that to interchange the
algorithm can bring better results if we use a machine learning technique
to weight the score of each Similarity Algorithm based on the
cirscunstances.

In other words, would it be possible, given a sample (ex.: first 500
documents returned by BM25), to choose and run other Similarity Algorithm
s just over those 500 documents instead of processing the whole index
again?

[1] Elasticsearch Platform — Find real-time answers at scale | Elastic**
similarity/http://www.elasticsearch.org/guide/reference/index-modules/similarity/

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
How to change similarity settings runtime? Elasticsearch	3	1257	July 6, 2017
Elasticseach: Default Similairty Algorithm and BM25 giving same results Elasticsearch	12	2408	November 14, 2018
Custom Ranking Functions (Custom Similarity Providers) on run time Elasticsearch	1	367	July 6, 2017
Update similarity measure for existing index Elasticsearch	2	438	July 6, 2017
I changed the elasticsearch.yml file, and added index.similairty.default.type: BM25, and rerun the elasticsearch executable file. Is it the right way to change similarity measure? My data are very big, I don't want to re index the data Elasticsearch	1	430	July 6, 2017

To Interchange the Similarity Algorithm

Related topics