Truncating scores

Hello everybody,

I am using the function_score query in order to compute a custom score for
items I am indexing into ElasticSearch. I am using a native script (written
in Java) in order to compute my score. This score is computed based on a
date (Date.getTime()). When I use a logger and look what is returned by my
native script, I get what I want, but when I look at the score of items
returned by query (I use the replace mode), I get a truncated number (e.g.
if a computed score displayed in the native script with the value 1 392 028
423 243, it is returned with the value 1 392 028 420 000 as score of
returned items). The problem here is that I am loosing milliseconds and
seconds (I only get the decade part of seconds). Loose milliseconds can be
acceptable, but I can't loose seconds.

Is this problem a limitation of ElasticSearch ? Is there any way to
workaround this problem ?

Thanks in advance for your replies.

Regards,
Loïc Wenkin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ccf7c19e-aa70-42ac-a4a4-d7174ab0de49%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Scores are Java floats so I'd expect them to be less precise then the long
that getTime returns. I believe you could look at sorting rather then
scoring or look at reducing the precision on the top bits of your long.
You know, y2k bug style.

The reason the score is a float is that for text scoring its exact enough.
Also, some of the lucene data structures are actually more lossy then
float. The field norm, iirc, is a floating point number packet into 8 bits
rather the float's 32.

Nik

On Wed, Apr 30, 2014 at 5:56 AM, Loïc Wenkin loic.wenkin@gmail.com wrote:

Hello everybody,

I am using the function_score query in order to compute a custom score for
items I am indexing into Elasticsearch. I am using a native script (written
in Java) in order to compute my score. This score is computed based on a
date (Date.getTime()). When I use a logger and look what is returned by my
native script, I get what I want, but when I look at the score of items
returned by query (I use the replace mode), I get a truncated number (e.g.
if a computed score displayed in the native script with the value 1 392 028
423 243, it is returned with the value 1 392 028 420 000 as score of
returned items). The problem here is that I am loosing milliseconds and
seconds (I only get the decade part of seconds). Loose milliseconds can be
acceptable, but I can't loose seconds.

Is this problem a limitation of Elasticsearch ? Is there any way to
workaround this problem ?

Thanks in advance for your replies.

Regards,
Loïc Wenkin

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ccf7c19e-aa70-42ac-a4a4-d7174ab0de49%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/ccf7c19e-aa70-42ac-a4a4-d7174ab0de49%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd39xkFEJNfb0x8C-M5h6GaxP7qqFYBFjTcBua1siVRttQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Hello Nikolas,

Thanks for your reply. I have done something like what you have just
explained. I divide the score by 5000 before returning it. Doing this, I
remove milliseconds and I keep a precision of 5 seconds, which I expect to
be enough. If it's always a problem, I may try to remove some years from
the date in order to get a smallest number.

I think that using sort is an hard work since I have something like this in
my documents :

a: {

b: {

objectsSortableByDate: [

...

]

},

c: {

objectsSortableByDate: [

...

]

}

}

I want to filter my entities according the smallest (or highest) date of
any "objectsSortableByDate" (whatever they are in b or in c), and sometime,
I may have more than two nested objects, so, I think that the easiest way
to sort is using a computed score. If you have a better idea, I will take
it :slight_smile:

Loïc

Le mercredi 30 avril 2014 14:48:37 UTC+2, Nikolas Everett a écrit :

Scores are Java floats so I'd expect them to be less precise then the long
that getTime returns. I believe you could look at sorting rather then
scoring or look at reducing the precision on the top bits of your long.
You know, y2k bug style.

The reason the score is a float is that for text scoring its exact
enough. Also, some of the lucene data structures are actually more lossy
then float. The field norm, iirc, is a floating point number packet into 8
bits rather the float's 32.

Nik

On Wed, Apr 30, 2014 at 5:56 AM, Loïc Wenkin <loic....@gmail.com<javascript:>

wrote:

Hello everybody,

I am using the function_score query in order to compute a custom score
for items I am indexing into Elasticsearch. I am using a native script
(written in Java) in order to compute my score. This score is computed
based on a date (Date.getTime()). When I use a logger and look what is
returned by my native script, I get what I want, but when I look at the
score of items returned by query (I use the replace mode), I get a
truncated number (e.g. if a computed score displayed in the native script
with the value 1 392 028 423 243, it is returned with the value 1 392 028
420 000 as score of returned items). The problem here is that I am loosing
milliseconds and seconds (I only get the decade part of seconds). Loose
milliseconds can be acceptable, but I can't loose seconds.

Is this problem a limitation of Elasticsearch ? Is there any way to
workaround this problem ?

Thanks in advance for your replies.

Regards,
Loïc Wenkin

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ccf7c19e-aa70-42ac-a4a4-d7174ab0de49%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/ccf7c19e-aa70-42ac-a4a4-d7174ab0de49%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c003b925-0766-4750-a722-3125a77c3774%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.