Custom Score Query and Sort questions


(John Chang) #1

My application needs to have returned hits ordered either by a text field or a date field. I've looked at the Custom Score Query doc (http://www.elasticsearch.com/docs/elasticsearch/rest_api/query_dsl/custom_score_query/) and the Sort doc (http://www.elasticsearch.com/docs/elasticsearch/rest_api/search/sort/), tried them out, and searched the forum. I'm afraid I'm still wondering:

  1. The custom_score queries with script to seem to sort based on script criteria as well (correct me if I am wrong). So, aside from performance differences, what are the functional differences between custom score query with a script and sorting?

  2. Is it the case that custom_scopre query with script actually changes the score values, whereas a sort will not change score values but just return in a different order? I suspected this from the docs, but I'm having trouble testing the idea because all my scores come back 0.0 fore each hit in my tests.

  3. The Sort doc reads, "Note, it is recommended, for single custom based script based sorting, to use custom_score query instead as sorting based on score is faster." So, when would one want (or need) to use sort over custom_score queries with a script to get ordered results? (Perhaps the answers the above answer this.)

  4. One of my searches needs to have results ordered alphabetic by a name field (if there), else an email field. Is it correct to believe this would have to be handled by a custom_score with a script (as I need if-else logic) and a simple sort won't work?

  5. The scripting module doc (http://www.elasticsearch.com/docs/elasticsearch/modules/scripting/) lists fields of type short, string, double, date, long, etc. If I need results ordered by date, what is the best way to store that field from a performance perspective?

  6. Does sharding impact ordered search performance?

  7. Are there any other important performance considerations for ordering results through Elastic Search I should be aware of (aside from the standard Lucene considerations)?

As always, thanks so much your your time and for an awesome technology!


(Shay Banon) #2

On Wed, Oct 6, 2010 at 7:34 PM, John Chang jchangkihtest2@gmail.com wrote:

My application needs to have returned hits ordered either by a text field
or
a date field. I've looked at the Custom Score Query doc
(
http://www.elasticsearch.com/docs/elasticsearch/rest_api/query_dsl/custom_score_query/
)
and the Sort doc
(http://www.elasticsearch.com/docs/elasticsearch/rest_api/search/sort/),
tried them out, and searched the forum. I'm afraid I'm still wondering:

  1. The custom_score queries with script to seem to sort based on script
    criteria as well (correct me if I am wrong). So, aside from performance
    differences, what are the functional differences between custom score query
    with a script and sorting?

The custom score query allows to provide a custom calculation of the score
of each document. With sorting, it will be sorted based on the value of the
field, without any custom calculation.

  1. Is it the case that custom_scopre query with script actually changes the
    score values, whereas a sort will not change score values but just return
    in
    a different order? I suspected this from the docs, but I'm having trouble
    testing the idea because all my scores come back 0.0 fore each hit in my
    tests.

Yea, it changes the score value. If the query would have has a score of 0.25
for a certain document, and your script is (for simplicity sake) "_score *
2", then the score of that document will be 0.5.

  1. The Sort doc reads, "Note, it is recommended, for single custom based
    script based sorting, to use custom_score query instead as sorting based on
    score is faster." So, when would one want (or need) to use sort over
    custom_score queries with a script to get ordered results? (Perhaps the
    answers the above answer this.)

You can also provide a script that will produce the sort values (compared
with just saying "sort by this field"). If you do so though, and its the
only sorting you do, then its usually better to have the same script used,
just with a custom score query. Note that this only applied to numeric
sorting with float precision.

  1. One of my searches needs to have results ordered alphabetic by a name
    field (if there), else an email field. Is it correct to believe this would
    have to be handled by a custom_score with a script (as I need if-else
    logic)
    and a simple sort won't work?

The sort element can have 2 fields to sort by, first the name, and then the
date. If that does not work (i.e. if its not similar names, they just don't
exists), then a script can be used with the mentioned "if / else". That
script should be a custom sort script and not a custom_score query, since
it produces a string, and not a number (which then you could have tried and
used custom score).

Note that mvel (the scripting language) gets a bit annoying when trying to
implement complex logic (though its very very fast for forumlas). I am
working on allowing to provide scripts in other langs.

  1. The scripting module doc
    (http://www.elasticsearch.com/docs/elasticsearch/modules/scripting/) lists
    fields of type short, string, double, date, long, etc. If I need results
    ordered by date, what is the best way to store that field from a
    performance
    perspective?

The simplest would be to add a sort by field on the date field. If you need
to access it in a script, then the best way would be to access it as it is
stored in the index, which is milliseconds since the epoch in long (this is
what you would get when you do: doc['my_date_field'].value.

  1. Does sharding impact ordered search performance?

Basically, each query is a "map / reduce" operation. The query gets executed
on the relevant shards, and then gets reduced back to a single response
(simplified). So, the more machines you have, and shards gets allocated to
them, the faster the search will be. Note that replicas also play a role
here (for example, increase the index.number_of_replicas from 1 to 2) since
they are searchable as well.

  1. Are there any other important performance considerations for ordering
    results through Elastic Search I should be aware of (aside from the
    standard
    Lucene considerations)?

Not sure what you include in the standard Lucene configuration, but
elasticsearch has a mechanism which is similar in nature to Lucene
FieldCache, so, when you sort on a field (or access it using doc[...] in a
script), its terms will be loaded to memory.

As always, thanks so much your your time and for an awesome technology!

No problem, here to help!

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Custom-Score-Query-and-Sort-questions-tp1644004p1644004.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.


(system) #3