Using a score threshold


(Rossini) #1

Hello Guys,

I have a legacy system based on Solr 1.2 and trying to port it to

ES. One of the features of the system is the possibility to create a
"context" inside your index, a "context" is a search in the index with
a big list of terms of some concept like Middle East Economy, Weather
in Brazil, etc... As the list of terms in the search grows, the number
of docs returned by that search also grows, but docs with a low score
are outside the context (meaning that the document is not of that
subject), so I Iet the user choose a threshold score that will define
the docs inside the context.

In that version of Solr, (Lucene was 2.2) there was no way to

filter the documents of a search by a score, and I needed to know how
many documents had a score higher than the threshold, so I created a
SearchHandler that used a ScoreCollector wich extended HitCollector
and just computed how many documents passed the threshold (a parameter
passed to the handler) and returned that.

Do you guys have any clue how to achieve this with ES? I would

have to write a plugin or something? Is there a way of filtering
results by a minimum score in ES? That would be great! I tried using
the _score in a filter script but I see that the scorer inside
DocLookup is null. Is there some kind of after-score filter, filters
that are applied after the score is computed?

Thanks for the help,
Rossini


(Thiago Souza) #2

That's a problem I'm facing too. Would like to know a workaround as well.

Cheers
On Feb 22, 2011 6:52 PM, "Rafael Rossini" rafael.rossini@gmail.com wrote:

Hello Guys,

I have a legacy system based on Solr 1.2 and trying to port it to
ES. One of the features of the system is the possibility to create a
"context" inside your index, a "context" is a search in the index with
a big list of terms of some concept like Middle East Economy, Weather
in Brazil, etc... As the list of terms in the search grows, the number
of docs returned by that search also grows, but docs with a low score
are outside the context (meaning that the document is not of that
subject), so I Iet the user choose a threshold score that will define
the docs inside the context.

In that version of Solr, (Lucene was 2.2) there was no way to
filter the documents of a search by a score, and I needed to know how
many documents had a score higher than the threshold, so I created a
SearchHandler that used a ScoreCollector wich extended HitCollector
and just computed how many documents passed the threshold (a parameter
passed to the handler) and returned that.

Do you guys have any clue how to achieve this with ES? I would
have to write a plugin or something? Is there a way of filtering
results by a minimum score in ES? That would be great! I tried using
the _score in a filter script but I see that the scorer inside
DocLookup is null. Is there some kind of after-score filter, filters
that are applied after the score is computed?

Thanks for the help,
Rossini


(Shay Banon) #3

Its problematic to use the score in a filter that is part of the search process (compared to using it in facets). But, you can open an issue and we can add a minimum score parameter that will filter out results that are below it.
On Thursday, February 24, 2011 at 4:29 AM, Thiago Souza wrote:

That's a problem I'm facing too. Would like to know a workaround as well.
Cheers
On Feb 22, 2011 6:52 PM, "Rafael Rossini" rafael.rossini@gmail.com wrote:

Hello Guys,

I have a legacy system based on Solr 1.2 and trying to port it to
ES. One of the features of the system is the possibility to create a
"context" inside your index, a "context" is a search in the index with
a big list of terms of some concept like Middle East Economy, Weather
in Brazil, etc... As the list of terms in the search grows, the number
of docs returned by that search also grows, but docs with a low score
are outside the context (meaning that the document is not of that
subject), so I Iet the user choose a threshold score that will define
the docs inside the context.

In that version of Solr, (Lucene was 2.2) there was no way to
filter the documents of a search by a score, and I needed to know how
many documents had a score higher than the threshold, so I created a
SearchHandler that used a ScoreCollector wich extended HitCollector
and just computed how many documents passed the threshold (a parameter
passed to the handler) and returned that.

Do you guys have any clue how to achieve this with ES? I would
have to write a plugin or something? Is there a way of filtering
results by a minimum score in ES? That would be great! I tried using
the _score in a filter script but I see that the scorer inside
DocLookup is null. Is there some kind of after-score filter, filters
that are applied after the score is computed?

Thanks for the help,
Rossini


(Shay Banon) #4

Nevermind, pushed support for this: https://github.com/elasticsearch/elasticsearch/issues/719 :slight_smile:
On Thursday, February 24, 2011 at 6:42 AM, Shay Banon wrote:

Its problematic to use the score in a filter that is part of the search process (compared to using it in facets). But, you can open an issue and we can add a minimum score parameter that will filter out results that are below it.
On Thursday, February 24, 2011 at 4:29 AM, Thiago Souza wrote:

That's a problem I'm facing too. Would like to know a workaround as well.
Cheers
On Feb 22, 2011 6:52 PM, "Rafael Rossini" rafael.rossini@gmail.com wrote:

Hello Guys,

I have a legacy system based on Solr 1.2 and trying to port it to
ES. One of the features of the system is the possibility to create a
"context" inside your index, a "context" is a search in the index with
a big list of terms of some concept like Middle East Economy, Weather
in Brazil, etc... As the list of terms in the search grows, the number
of docs returned by that search also grows, but docs with a low score
are outside the context (meaning that the document is not of that
subject), so I Iet the user choose a threshold score that will define
the docs inside the context.

In that version of Solr, (Lucene was 2.2) there was no way to
filter the documents of a search by a score, and I needed to know how
many documents had a score higher than the threshold, so I created a
SearchHandler that used a ScoreCollector wich extended HitCollector
and just computed how many documents passed the threshold (a parameter
passed to the handler) and returned that.

Do you guys have any clue how to achieve this with ES? I would
have to write a plugin or something? Is there a way of filtering
results by a minimum score in ES? That would be great! I tried using
the _score in a filter script but I see that the scorer inside
DocLookup is null. Is there some kind of after-score filter, filters
that are applied after the score is computed?

Thanks for the help,
Rossini


(Thiago Souza) #5

Thanks!!
On Feb 24, 2011 1:56 AM, "Shay Banon" shay.banon@elasticsearch.com wrote:

Nevermind, pushed support for this:
https://github.com/elasticsearch/elasticsearch/issues/719 :slight_smile:
On Thursday, February 24, 2011 at 6:42 AM, Shay Banon wrote:

Its problematic to use the score in a filter that is part of the search
process (compared to using it in facets). But, you can open an issue and we
can add a minimum score parameter that will filter out results that are
below it.

On Thursday, February 24, 2011 at 4:29 AM, Thiago Souza wrote:

That's a problem I'm facing too. Would like to know a workaround as
well.

Cheers
On Feb 22, 2011 6:52 PM, "Rafael Rossini" rafael.rossini@gmail.com
wrote:

Hello Guys,

I have a legacy system based on Solr 1.2 and trying to port it to
ES. One of the features of the system is the possibility to create a
"context" inside your index, a "context" is a search in the index
with

a big list of terms of some concept like Middle East Economy, Weather
in Brazil, etc... As the list of terms in the search grows, the
number

of docs returned by that search also grows, but docs with a low score
are outside the context (meaning that the document is not of that
subject), so I Iet the user choose a threshold score that will define
the docs inside the context.

In that version of Solr, (Lucene was 2.2) there was no way to
filter the documents of a search by a score, and I needed to know how
many documents had a score higher than the threshold, so I created a
SearchHandler that used a ScoreCollector wich extended HitCollector
and just computed how many documents passed the threshold (a
parameter

passed to the handler) and returned that.

Do you guys have any clue how to achieve this with ES? I would
have to write a plugin or something? Is there a way of filtering
results by a minimum score in ES? That would be great! I tried using
the _score in a filter script but I see that the scorer inside
DocLookup is null. Is there some kind of after-score filter, filters
that are applied after the score is computed?

Thanks for the help,
Rossini


(Rossini) #6

Thanks a lot! Keep going with the great work.

On Feb 24, 1:55 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Nevermind, pushed support for this:https://github.com/elasticsearch/elasticsearch/issues/719:)

On Thursday, February 24, 2011 at 6:42 AM, Shay Banon wrote:

Its problematic to use the score in a filter that is part of the search process (compared to using it in facets). But, you can open an issue and we can add a minimum score parameter that will filter out results that are below it.
On Thursday, February 24, 2011 at 4:29 AM, Thiago Souza wrote:

That's a problem I'm facing too. Would like to know a workaround as well.
Cheers
On Feb 22, 2011 6:52 PM, "Rafael Rossini" rafael.ross...@gmail.com wrote:

Hello Guys,

I have a legacy system based on Solr 1.2 and trying to port it to
ES. One of the features of the system is the possibility to create a
"context" inside your index, a "context" is a search in the index with
a big list of terms of some concept like Middle East Economy, Weather
in Brazil, etc... As the list of terms in the search grows, the number
of docs returned by that search also grows, but docs with a low score
are outside the context (meaning that the document is not of that
subject), so I Iet the user choose a threshold score that will define
the docs inside the context.

In that version of Solr, (Lucene was 2.2) there was no way to
filter the documents of a search by a score, and I needed to know how
many documents had a score higher than the threshold, so I created a
SearchHandler that used a ScoreCollector wich extended HitCollector
and just computed how many documents passed the threshold (a parameter
passed to the handler) and returned that.

Do you guys have any clue how to achieve this with ES? I would
have to write a plugin or something? Is there a way of filtering
results by a minimum score in ES? That would be great! I tried using
the _score in a filter script but I see that the scorer inside
DocLookup is null. Is there some kind of after-score filter, filters
that are applied after the score is computed?

Thanks for the help,
Rossini


(system) #7