Query Help


(Paul Loy) #1

Hi all,

I was wondering what the best query might be for the following scenario.

I would like to search based on an average rating and time. So I want the
highest rated, but also new things. So a query may return something like
this:

rating 0, just created
rating 3, created a few minutes ago
rating 5, created an hour ago
rating 4, created an hour ago
rating 3 created a day ago
rating 5 created a week ago
rating 2 created a day ago
rating 1 created an hour ago

As you can see, I kind of want the query to combine freshness and goodness.
Should I just perform 2 queries for this and merge them in code, or is there
a way to create a score based on the rating and age and order on that?

Thanks,

Paul.


Paul Loy
paul@keteracel.com
http://www.keteracel.com/paul


(Clinton Gormley) #2

Hi Paul

rating 0, just created
rating 3, created a few minutes ago
rating 5, created an hour ago
rating 4, created an hour ago
rating 3 created a day ago
rating 5 created a week ago
rating 2 created a day ago
rating 1 created an hour ago

As you can see, I kind of want the query to combine freshness and
goodness. Should I just perform 2 queries for this and merge them in
code, or is there a way to create a score based on the rating and age
and order on that?

You can use the custom-score query, and provide a script which will
alter the _score to take recency into account:
http://www.elasticsearch.com/docs/elasticsearch/rest_api/query_dsl/custom_score_query/

clint

--
Web Announcements Limited is a company registered in England and Wales,
with company number 05608868, with registered address at 10 Arvon Road,
London, N5 1PR.


(Paul Loy) #3

I'd want something like this:

Score = Total_Rating / ( (Weight / Average_Rating)* AgeOfRecordInHours)

or see here for a readable form of that: http://bit.ly/cTOIae

Hmm... so that should be doable with something like this:

SearchRequestBuilder topRated = ...

    Map<String,Object> params = new HashMap<String, Object>();
    params.put("hour", new

Double(((double)System.currentTimeMillis())/MILLIS_IN_AN_HOUR));
params.put("weight", new Double(1D));

    topRated
        .setQuery(termQuery("category", category))
        .addScriptField("composite_score", "doc['totalRating'].value / (

( weight / doc['rating'].value ) * ( doc['hourcreated'].value - hour ) )",
params)
.addSort("composite_score", SortOrder.DESC);

the idea is that highly rated things will decay over time and only stay at
the top of a list if they are continually rated highly.

Thanks Clint!

On Wed, Sep 22, 2010 at 12:20 PM, Clinton Gormley
clinton@iannounce.co.ukwrote:

Hi Paul

rating 0, just created
rating 3, created a few minutes ago
rating 5, created an hour ago
rating 4, created an hour ago
rating 3 created a day ago
rating 5 created a week ago
rating 2 created a day ago
rating 1 created an hour ago

As you can see, I kind of want the query to combine freshness and
goodness. Should I just perform 2 queries for this and merge them in
code, or is there a way to create a score based on the rating and age
and order on that?

You can use the custom-score query, and provide a script which will
alter the _score to take recency into account:

http://www.elasticsearch.com/docs/elasticsearch/rest_api/query_dsl/custom_score_query/

clint

--
Web Announcements Limited is a company registered in England and Wales,
with company number 05608868, with registered address at 10 Arvon Road,
London, N5 1PR.

--

Paul Loy
paul@keteracel.com
http://www.keteracel.com/paul


(Shay Banon) #4

Hi,

The script field is just aimed at fetching data with, not for sorting on.
If you want to sort using a script, you can use the sort script option. In
the Java API, its .addSort(SortBuliders.scriptSort(script, type).

The main drawback when sorting using scripts is the fact that it does not
use the scoring of the query you want (in your case, with the category based
term query, its not really relevant, but might be for more free text
search). For this, you can use the custom score query, that can take that
into account.

In terms of performance, custom score query will behave better than
script sorting, mainly because of the fact that score based sorting (i.e.
when no sort is provided) is faster. And one more note regarding
performance, if you have a small set of categories (or in any case do
sorting / faceting on them), then executing a filtered query with match_all
query and term filter will be faster (in upcoming 0.11).

-shay.banon

On Wed, Sep 22, 2010 at 2:09 PM, Paul Loy keteracel@gmail.com wrote:

I'd want something like this:

Score = Total_Rating / ( (Weight / Average_Rating)* AgeOfRecordInHours)

or see here for a readable form of that: http://bit.ly/cTOIae

Hmm... so that should be doable with something like this:

SearchRequestBuilder topRated = ...

    Map<String,Object> params = new HashMap<String, Object>();
    params.put("hour", new

Double(((double)System.currentTimeMillis())/MILLIS_IN_AN_HOUR));
params.put("weight", new Double(1D));

    topRated
        .setQuery(termQuery("category", category))
        .addScriptField("composite_score", "doc['totalRating'].value /

( ( weight / doc['rating'].value ) * ( doc['hourcreated'].value - hour ) )",
params)
.addSort("composite_score", SortOrder.DESC);

the idea is that highly rated things will decay over time and only stay at
the top of a list if they are continually rated highly.

Thanks Clint!

On Wed, Sep 22, 2010 at 12:20 PM, Clinton Gormley <clinton@iannounce.co.uk

wrote:

Hi Paul

rating 0, just created
rating 3, created a few minutes ago
rating 5, created an hour ago
rating 4, created an hour ago
rating 3 created a day ago
rating 5 created a week ago
rating 2 created a day ago
rating 1 created an hour ago

As you can see, I kind of want the query to combine freshness and
goodness. Should I just perform 2 queries for this and merge them in
code, or is there a way to create a score based on the rating and age
and order on that?

You can use the custom-score query, and provide a script which will
alter the _score to take recency into account:

http://www.elasticsearch.com/docs/elasticsearch/rest_api/query_dsl/custom_score_query/

clint

--
Web Announcements Limited is a company registered in England and Wales,
with company number 05608868, with registered address at 10 Arvon Road,
London, N5 1PR.

--

Paul Loy
paul@keteracel.com
http://www.keteracel.com/paul


(system) #5