So, this query seems to do want we want: it multiples the query score for
document with our custom trending score (stored on the "trendingScore"
field). The problem is that the trending score, in many cases, overwhelms
the query score. Thus, documents with very low relevancy, but very high
trending, are at the top of our results. Ideally, we'd filter the query to
only return the top N percentage of documents that matched, but I don't
think that's possible. We've looked at the min_score parameter for queries
as well, but I don't know what a "good" value would be for this.
Does anyone have any ideas on the best way to solve this problem? Thank you
ahead of time!
My advice would be to use the script function type. Inside it you can
access _score which is the score given by the query and the value of the
field. Mix them together in whatever logic you want.
So, this query seems to do want we want: it multiples the query score for
document with our custom trending score (stored on the "trendingScore"
field). The problem is that the trending score, in many cases, overwhelms
the query score. Thus, documents with very low relevancy, but very high
trending, are at the top of our results. Ideally, we'd filter the query to
only return the top N percentage of documents that matched, but I don't
think that's possible. We've looked at the min_score parameter for queries
as well, but I don't know what a "good" value would be for this.
Does anyone have any ideas on the best way to solve this problem? Thank
you ahead of time!
If you did the script score path, don't you run into performance issues? I
would think, running say thousands of queries like that would probably not
be performant.
unless I am missing something.
On Sunday, December 21, 2014 8:01:06 PM UTC-8, vineeth mohan wrote:
Hello ,
My advice would be to use the script function type. Inside it you can
access _score which is the score given by the query and the value of the
field. Mix them together in whatever logic you want.
Thanks
Vineeth
On Thu, Dec 11, 2014 at 7:28 PM, hespoddi <ch...@publishthis.com
<javascript:>> wrote:
Hi all,
We'd like to combine the query score with our own custom trending score
for a given document. Currently, our query looks like:
So, this query seems to do want we want: it multiples the query score for
document with our custom trending score (stored on the "trendingScore"
field). The problem is that the trending score, in many cases, overwhelms
the query score. Thus, documents with very low relevancy, but very high
trending, are at the top of our results. Ideally, we'd filter the query to
only return the top N percentage of documents that matched, but I don't
think that's possible. We've looked at the min_score parameter for queries
as well, but I don't know what a "good" value would be for this.
Does anyone have any ideas on the best way to solve this problem? Thank
you ahead of time!
I dont see why it should run into performance issue.
In anyway you do it , the _score and score due to a field have to be
computed/loaded.
If you precompile your script by placing it in config directory , that
should be good enough.
Also feel free to write the same in Java code and attach that as script.
If you did the script score path, don't you run into performance issues?
I would think, running say thousands of queries like that would probably
not be performant.
unless I am missing something.
On Sunday, December 21, 2014 8:01:06 PM UTC-8, vineeth mohan wrote:
Hello ,
My advice would be to use the script function type. Inside it you can
access _score which is the score given by the query and the value of the
field. Mix them together in whatever logic you want.
So, this query seems to do want we want: it multiples the query score
for document with our custom trending score (stored on the "trendingScore"
field). The problem is that the trending score, in many cases, overwhelms
the query score. Thus, documents with very low relevancy, but very high
trending, are at the top of our results. Ideally, we'd filter the query to
only return the top N percentage of documents that matched, but I don't
think that's possible. We've looked at the min_score parameter for queries
as well, but I don't know what a "good" value would be for this.
Does anyone have any ideas on the best way to solve this problem? Thank
you ahead of time!
The problem isn't really the query. The problem is we'd like the limit the
results of the query to just "high" scores before we apply the
function_score. There is a min_score parameter we could use:
But what the min_score should be will, obviously, vary significantly
depending on the query. Ideally, we'd set the min_score to some percentile
of the max score for query, but I don't think that's possible:
I was curious if anyone had any other ideas about how to do this (or
something close)?
-Chris
On Monday, December 22, 2014 1:34:08 PM UTC-5, vineeth mohan wrote:
Hi ,
I dont see why it should run into performance issue.
In anyway you do it , the _score and score due to a field have to be
computed/loaded.
If you precompile your script by placing it in config directory , that
should be good enough.
Also feel free to write the same in Java code and attach that as script.
Thanks
Vineeth
On Mon, Dec 22, 2014 at 10:30 PM, Scott Decker <sc...@publishthis.com
<javascript:>> wrote:
If you did the script score path, don't you run into performance issues?
I would think, running say thousands of queries like that would probably
not be performant.
unless I am missing something.
On Sunday, December 21, 2014 8:01:06 PM UTC-8, vineeth mohan wrote:
Hello ,
My advice would be to use the script function type. Inside it you can
access _score which is the score given by the query and the value of the
field. Mix them together in whatever logic you want.
So, this query seems to do want we want: it multiples the query score
for document with our custom trending score (stored on the "trendingScore"
field). The problem is that the trending score, in many cases, overwhelms
the query score. Thus, documents with very low relevancy, but very high
trending, are at the top of our results. Ideally, we'd filter the query to
only return the top N percentage of documents that matched, but I don't
think that's possible. We've looked at the min_score parameter for queries
as well, but I don't know what a "good" value would be for this.
Does anyone have any ideas on the best way to solve this problem? Thank
you ahead of time!
The problem isn't really the query. The problem is we'd like the limit the
results of the query to just "high" scores before we apply the
function_score. There is a min_score parameter we could use:
But what the min_score should be will, obviously, vary significantly
depending on the query. Ideally, we'd set the min_score to some percentile
of the max score for query, but I don't think that's possible:
I was curious if anyone had any other ideas about how to do this (or
something close)?
-Chris
On Monday, December 22, 2014 1:34:08 PM UTC-5, vineeth mohan wrote:
Hi ,
I dont see why it should run into performance issue.
In anyway you do it , the _score and score due to a field have to be
computed/loaded.
If you precompile your script by placing it in config directory , that
should be good enough.
Also feel free to write the same in Java code and attach that as script.
If you did the script score path, don't you run into performance
issues? I would think, running say thousands of queries like that would
probably not be performant.
unless I am missing something.
On Sunday, December 21, 2014 8:01:06 PM UTC-8, vineeth mohan wrote:
Hello ,
My advice would be to use the script function type. Inside it you can
access _score which is the score given by the query and the value of the
field. Mix them together in whatever logic you want.
So, this query seems to do want we want: it multiples the query score
for document with our custom trending score (stored on the "trendingScore"
field). The problem is that the trending score, in many cases, overwhelms
the query score. Thus, documents with very low relevancy, but very high
trending, are at the top of our results. Ideally, we'd filter the query to
only return the top N percentage of documents that matched, but I don't
think that's possible. We've looked at the min_score parameter for queries
as well, but I don't know what a "good" value would be for this.
Does anyone have any ideas on the best way to solve this problem?
Thank you ahead of time!
We originally thought rescoring would work as well! I actually implemented
it, but low relevancy documents continued to show up at the top of our
results. I didn't understand this at first, but on re-reading the
documentation I saw what the problem was: the rescore is executed on the
shard before the results are returned to the node handling the overall
request. So: if an individual shard only had low relevancy, high trending
documents for a query, then we'd run into the exact same problem as before.
And, in fact, that does seem to happen with enough frequency to be an issue
for us.
At the moment, the only way I see to solve the problem is to do a
post-processing step on the returned documents.
On Monday, December 22, 2014 11:50:24 PM UTC-5, vineeth mohan wrote:
On Tue, Dec 23, 2014 at 2:46 AM, hespoddi <ch...@publishthis.com
<javascript:>> wrote:
Hi all,
The problem isn't really the query. The problem is we'd like the limit
the results of the query to just "high" scores before we apply the
function_score. There is a min_score parameter we could use:
But what the min_score should be will, obviously, vary significantly
depending on the query. Ideally, we'd set the min_score to some percentile
of the max score for query, but I don't think that's possible:
I was curious if anyone had any other ideas about how to do this (or
something close)?
-Chris
On Monday, December 22, 2014 1:34:08 PM UTC-5, vineeth mohan wrote:
Hi ,
I dont see why it should run into performance issue.
In anyway you do it , the _score and score due to a field have to be
computed/loaded.
If you precompile your script by placing it in config directory , that
should be good enough.
Also feel free to write the same in Java code and attach that as script.
If you did the script score path, don't you run into performance
issues? I would think, running say thousands of queries like that would
probably not be performant.
unless I am missing something.
On Sunday, December 21, 2014 8:01:06 PM UTC-8, vineeth mohan wrote:
Hello ,
My advice would be to use the script function type. Inside it you can
access _score which is the score given by the query and the value of the
field. Mix them together in whatever logic you want.
So, this query seems to do want we want: it multiples the query score
for document with our custom trending score (stored on the "trendingScore"
field). The problem is that the trending score, in many cases, overwhelms
the query score. Thus, documents with very low relevancy, but very high
trending, are at the top of our results. Ideally, we'd filter the query to
only return the top N percentage of documents that matched, but I don't
think that's possible. We've looked at the min_score parameter for queries
as well, but I don't know what a "good" value would be for this.
Does anyone have any ideas on the best way to solve this problem?
Thank you ahead of time!
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.