To Interchange the Similarity Algorithm

Hi people,

I've read in ElasticSearch docs [1] that an index is bounded to a
Similarity Algorithm (like BM25, DFR, TF/IDF or IB).

However, I'd like to know if would it be possible to specify a Similarity
Algorithm at runtime because I have some evidences that to interchange the
algorithm can bring better results if we use a machine learning technique
to weight the score of each Similarity Algorithm based on the
cirscunstances.

In other words, would it be possible, given a sample (ex.: first 500
documents returned by BM25), to choose and run other Similarity Algorithmsjust over those 500 documents instead of processing the whole index again?

[1] http://www.elasticsearch.org/guide/reference/index-modules/similarity/

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

you can configure similarity per field not per index. Yet, statistics for
certain algos are written at index time so you can't change it at runtime
you would need to index the field multiple times.

simon

On Tuesday, August 27, 2013 10:20:07 PM UTC+2, Fábio Figueiredo wrote:

Hi people,

I've read in ElasticSearch docs [1] that an index is bounded to a
Similarity Algorithm (like BM25, DFR, TF/IDF or IB).

However, I'd like to know if would it be possible to specify a Similarity
Algorithm at runtime because I have some evidences that to interchange the
algorithm can bring better results if we use a machine learning technique
to weight the score of each Similarity Algorithm based on the
cirscunstances.

In other words, would it be possible, given a sample (ex.: first 500
documents returned by BM25), to choose and run other Similarity Algorithm*
s* just over those 500 documents instead of processing the whole index
again?

[1] http://www.elasticsearch.org/guide/reference/index-modules/similarity/

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Have you thought about putting the same documents in multiple indices (with
different similarity algorithms) and then figure out which algorithm works
best?

I am not sure how many documents you have but it is something to think
about.

Though some of a computation is done at query time, there is a good chunk
of preparatory work has to be done at index time for each algorithm so I
don't think you can change it on the fly like the way you are proposing.

Take a look at this documentation [1] to get a better understanding of the
inner workings of the Similarity feature

[1]
http://lucene.apache.org/core/4_4_0/core/org/apache/lucene/search/similarities/Similarity.html

[2]
http://lucene.apache.org/core/4_4_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html

Author and Instructor for the Upcoming Book and Lecture Series
Massive Log Data Aggregation, Processing, Searching and Visualization with
Open Source Software

http://massivelogdata.com

On 27 August 2013 16:48, simonw simon.willnauer@elasticsearch.com wrote:

you can configure similarity per field not per index. Yet, statistics for
certain algos are written at index time so you can't change it at runtime
you would need to index the field multiple times.

simon

On Tuesday, August 27, 2013 10:20:07 PM UTC+2, Fábio Figueiredo wrote:

Hi people,

I've read in ElasticSearch docs [1] that an index is bounded to a
Similarity Algorithm (like BM25, DFR, TF/IDF or IB).

However, I'd like to know if would it be possible to specify a Similarity
Algorithm at runtime because I have some evidences that to interchange the
algorithm can bring better results if we use a machine learning technique
to weight the score of each Similarity Algorithm based on the
cirscunstances.

In other words, would it be possible, given a sample (ex.: first 500
documents returned by BM25), to choose and run other Similarity Algorithm
s just over those 500 documents instead of processing the whole index
again?

[1] http://www.elasticsearch.org/guide/reference/index-modules/
similarity/http://www.elasticsearch.org/guide/reference/index-modules/similarity/

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.