More accurate date based scoring

PUT /test/test/1
{
"date":"2013-04-01T00:00:00Z"
}

PUT /test/test/2
{
"date":"2013-04-01T00:00:01Z"
}

PUT /test/test/3
{
"date":"2013-04-01T00:00:03Z"
}

PUT /test/test/4
{
"date":"2013-04-01T00:01:03Z"
}

Given these documents, I'm trying to come up with a query that scores them
such that they come out in their natural sort order using function_score.
My problem is that ids 1 to 3 always come out with the exact same score.

GET /test/test/_search
{
"query": {
"function_score": {
"score_mode": "max",
"functions": [
{
"exp": {
"date": {
"origin": "2014-10-01T00:00:00Z",
"scale": "1000d"
}
}
}
]
}
}
}

This query is a good example.

I tried prototyping a script query which seems to reveal the real issue:
the dates have an accuracy of 1 minute.

GET /test/_search
{
"fields": ["date"],
"query": {
"function_score": {
"query": {
"match_all": {}
},
"score_mode": "max",
"functions": [
{
"script_score": {
"lang": "expression",
"script": "doc['date'].value"
}
}
]
}
}
}

This query returns the actual field value as the score. For the first three
documents I get the score: 1364774350000, the fourth document is scored
with 1364774490000. It looks very much like elasticsearch is rounding the
timestamp internally.

Is there a way to get second level accuracy (or even better) here? I know I
can use sorting here but that would effectively get rid of any meaningful
ranking for the rest of my query. But minute accuracy is just not going to
be good enough either.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5c8b7caa-06d5-4992-a466-b9cc4a8f397b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

This is not Elasticsearch.

Timestamps are represented in milliseconds as 64-bit longs. Scores are
represented as 64-bit doubles. Per Java language specification, it is not
possible to express a 64 bit long in a double because IEEE 754 only allows
52 bits.

So all longs with a finer resolution than 52 bit (that is second
resolution) can not be used as an injective function for scoring.

You have the following options:

  • use a coarser resolution than second

  • organize timestamps into two or more fields for script based calculations
    (e.g. days since 01-Jan-1970, seconds of day)

  • use only a subrange of all possible timestamps between Thu Jan 01
    01:00:00 CET 1970 and Sun Aug 17 08:12:55 CET 292278994 that can map into
    52 bit wide longs so they can be represented by doubles

Jörg

On Mon, Oct 6, 2014 at 7:17 PM, Jilles van Gurp jillesvangurp@gmail.com
wrote:

PUT /test/test/1
{
"date":"2013-04-01T00:00:00Z"
}

PUT /test/test/2
{
"date":"2013-04-01T00:00:01Z"
}

PUT /test/test/3
{
"date":"2013-04-01T00:00:03Z"
}

PUT /test/test/4
{
"date":"2013-04-01T00:01:03Z"
}

Given these documents, I'm trying to come up with a query that scores them
such that they come out in their natural sort order using function_score.
My problem is that ids 1 to 3 always come out with the exact same score.

GET /test/test/_search
{
"query": {
"function_score": {
"score_mode": "max",
"functions": [
{
"exp": {
"date": {
"origin": "2014-10-01T00:00:00Z",
"scale": "1000d"
}
}
}
]
}
}
}

This query is a good example.

I tried prototyping a script query which seems to reveal the real issue:
the dates have an accuracy of 1 minute.

GET /test/_search
{
"fields": ["date"],
"query": {
"function_score": {
"query": {
"match_all": {}
},
"score_mode": "max",
"functions": [
{
"script_score": {
"lang": "expression",
"script": "doc['date'].value"
}
}
]
}
}
}

This query returns the actual field value as the score. For the first
three documents I get the score: 1364774350000, the fourth document is
scored with 1364774490000. It looks very much like elasticsearch is
rounding the timestamp internally.

Is there a way to get second level accuracy (or even better) here? I know
I can use sorting here but that would effectively get rid of any meaningful
ranking for the rest of my query. But minute accuracy is just not going to
be good enough either.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5c8b7caa-06d5-4992-a466-b9cc4a8f397b%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/5c8b7caa-06d5-4992-a466-b9cc4a8f397b%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoE6eL-GS%2BdYh6SuoJ1Qy6sdPWhqGttGUEEZrr0Z3SgSmg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Thanks for the clarification. I'll have to experiment a bit with this.

How would the multi field solution work on the ranking side? You'd still
have the precision problem when combining the scores, right?

Jilles

On Monday, October 6, 2014 8:42:17 PM UTC+2, Jörg Prante wrote:

This is not Elasticsearch.

Timestamps are represented in milliseconds as 64-bit longs. Scores are
represented as 64-bit doubles. Per Java language specification, it is not
possible to express a 64 bit long in a double because IEEE 754 only allows
52 bits.

http://en.wikipedia.org/wiki/Double-precision_floating-point_format

So all longs with a finer resolution than 52 bit (that is second
resolution) can not be used as an injective function for scoring.

You have the following options:

  • use a coarser resolution than second

  • organize timestamps into two or more fields for script based
    calculations (e.g. days since 01-Jan-1970, seconds of day)

  • use only a subrange of all possible timestamps between Thu Jan 01
    01:00:00 CET 1970 and Sun Aug 17 08:12:55 CET 292278994 that can map into
    52 bit wide longs so they can be represented by doubles

Jörg

On Mon, Oct 6, 2014 at 7:17 PM, Jilles van Gurp <jilles...@gmail.com
<javascript:>> wrote:

PUT /test/test/1
{
"date":"2013-04-01T00:00:00Z"
}

PUT /test/test/2
{
"date":"2013-04-01T00:00:01Z"
}

PUT /test/test/3
{
"date":"2013-04-01T00:00:03Z"
}

PUT /test/test/4
{
"date":"2013-04-01T00:01:03Z"
}

Given these documents, I'm trying to come up with a query that scores
them such that they come out in their natural sort order using
function_score. My problem is that ids 1 to 3 always come out with the
exact same score.

GET /test/test/_search
{
"query": {
"function_score": {
"score_mode": "max",
"functions": [
{
"exp": {
"date": {
"origin": "2014-10-01T00:00:00Z",
"scale": "1000d"
}
}
}
]
}
}
}

This query is a good example.

I tried prototyping a script query which seems to reveal the real issue:
the dates have an accuracy of 1 minute.

GET /test/_search
{
"fields": ["date"],
"query": {
"function_score": {
"query": {
"match_all": {}
},
"score_mode": "max",
"functions": [
{
"script_score": {
"lang": "expression",
"script": "doc['date'].value"
}
}
]
}
}
}

This query returns the actual field value as the score. For the first
three documents I get the score: 1364774350000, the fourth document is
scored with 1364774490000. It looks very much like elasticsearch is
rounding the timestamp internally.

Is there a way to get second level accuracy (or even better) here? I know
I can use sorting here but that would effectively get rid of any meaningful
ranking for the rest of my query. But minute accuracy is just not going to
be good enough either.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5c8b7caa-06d5-4992-a466-b9cc4a8f397b%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/5c8b7caa-06d5-4992-a466-b9cc4a8f397b%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/39232802-4430-415f-9217-39128493eb70%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

I found another workable solution.

"sort": [
{
"_score": {
"order": "desc"
}
},
{
"date": {
"order": "desc"
}
}
]

This sorts first by score and then by date. So this has the effect of
ranking by score and then ranking those items with the same score by date.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/150e9e85-3124-42d8-a4e9-a206c5280bfc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.