More efficient date sorting

Hi all,

We have an index with ms precision dates stored as longs.

To sort on this, if I understand correctly, we need to load all of the
longs into memory in the field cache.

However, if we know that all of the dates are < now(), we could use custom
scoring with a decay function to more efficiently sort the result set.

Is this a good idea - or crazy? :slight_smile:

Matt

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Custom scoring is expensive.

If you can restrict your sorting domain to int range, you do not even need
to encode dates as longs, just use ints instead (or even bytes).

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Fair point :slight_smile: In this case it would be possible to map the longs to ints as
a multi field value at index time? E.g. Ms since epoch =>Minutes since X

Strikes me that having control over sort granularity in es would be a cool
feature.

On Friday, November 22, 2013, joergprante@gmail.com wrote:

Custom scoring is expensive.

If you can restrict your sorting domain to int range, you do not even need
to encode dates as longs, just use ints instead (or even bytes).

Jörg

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/Mb9XwhA34j8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com <javascript:_e({}, 'cvml',
'elasticsearch%2Bunsubscribe@googlegroups.com');>.
For more options, visit https://groups.google.com/groups/opt_out.

--
Sent from Gmail Mobile

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Did you run any experiments comparing sorting on dates vs the custom scoring suggestion you made ?

No. But I would suggest that mapping from a long to a int would be
obviously more performant.
On 25 Sep 2014 15:16, "Bleh" anant.aneja@gmail.com wrote:

Did you run any experiments comparing sorting on dates vs the custom
scoring
suggestion you made ?

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/More-efficient-date-sorting-tp4044842p4063997.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/Mb9XwhA34j8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/1411632317090-4063997.post%40n3.nabble.com
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAG_V-y6hCzX%2B8eH%3DFLA%3DVL4LV2-FLOLguZ6J60FW8OYPuUwg9A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Hi,

Initially we too used System.currentTimeInMillis() . Then we switched to 2
int fields something like yyyyMMdd & HHmmssSSS.

If query's time criteria falls with in single date then we dont apply yyyyMMdd
field
for sorting. We have decent performance compared with
System.currentTimeInMillis().

Hi Jörg,

How much memory will es take when applying aggregation on a long field
which contains ~80 million (1000 * 60 * 60 * 24 unique millis in a day =
86400000 ) unique long values.

If i understand correctly 80 million * 8 bytes for long . i.e., 86400000 *
8 = 691200000 (659 MB). Incase of yyyyMMdd as int field , 86400000 * 4
= 345600000 (329 MB).

What is the role of lucene's packedInt here in this case? Sorry if i
missing something.

Also we are using doc values with default option.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7e21797e-d5b9-4e9c-af90-58c693fad89b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Your formula is not correct.

yyyMMdd would map all values of a day to a single integer and you get
something like "sort by day" or "filter by day".

Assumed you have a normal distribution and you have a year of timestamps,
you can estimate: 80mio / 365 = 219.178 timestamps per day. In the "day
field", you have only 365 integers in the cache instead of 80mio longs for
unique millis. If "day" is too coarse, you can add an hour, minute, second
index.

Jörg

On Fri, Sep 26, 2014 at 1:21 PM, Anantha Govindarajan <
ananthagovindarajan@gmail.com> wrote:

Hi,

Initially we too used System.currentTimeInMillis() . Then we switched to 2
int fields something like yyyyMMdd & HHmmssSSS.

If query's time criteria falls with in single date then we dont apply yyyyMMdd
field
for sorting. We have decent performance compared with
System.currentTimeInMillis().

Hi Jörg,

How much memory will es take when applying aggregation on a long field
which contains ~80 million (1000 * 60 * 60 * 24 unique millis in a day =
86400000 ) unique long values.

If i understand correctly 80 million * 8 bytes for long . i.e., 86400000 *
8 = 691200000 (659 MB). Incase of yyyyMMdd as int field , 86400000 * 4
= 345600000 (329 MB).

What is the role of lucene's packedInt here in this case? Sorry if i
missing something.

Also we are using doc values with default option.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/7e21797e-d5b9-4e9c-af90-58c693fad89b%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/7e21797e-d5b9-4e9c-af90-58c693fad89b%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGg86_qMgymP3_g0XM8off7o3_PL-Ajp7hdkUjmepHygQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Hi , Sorry I intended to say HHmmssSSS field . When I apply sorting or aggregations on HHmmssSSS field how much memory will it take ? In this case number of unique values for HHmmssSSS field can be 8640000(~80.6 million) . FYI: We are maintaining daily indexes . when user trying to search across days (for example last 7 days) , I will sort by both yyyyMMdd & HHmmssSSS. If user searches for single day alone ( for example today) , I will sort by only HHmmssSSS field alone )

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/723eb22f-7301-44ff-bc8d-8ccaffcae77c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hi Jörg, Sorry
I intended to say for HHmmssSSS field . How much memory will es take when I
apply sorting or aggregations on HHmmssSSS field . In this case number of
unique values for HHmmssSSS field can be 86400000(~80.6 million.
Note : We are creating daily indexes . If user searches
on multiple dates (for example last 7 days ) , then I will sort by both
yyyyMMdd & HHmmssSSS. If user searches for single day (for example today )
the I will apply sorting on HHmmssSSS field alone .

--

--

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEX_wjpbFT-efdyXGbaOpPyV-dAR%3Df%3Dw0%3DEy6c5C3nPjnPcn%2BA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Hi Anantha,

yes, that sounds reasonable to me. In that case you have two longs, and if
you filter by day, you can save resources for sorting within the day.

Jörg

On Fri, Sep 26, 2014 at 7:16 PM, Anantha Govindarajan <
ananthagovindarajan@gmail.com> wrote:

Hi Jörg, Sorry
I intended to say for HHmmssSSS field . How much memory will es take when I
apply sorting or aggregations on HHmmssSSS field . In this case number of
unique values for HHmmssSSS field can be 86400000(~80.6 million.
Note : We are creating daily indexes . If user searches
on multiple dates (for example last 7 days ) , then I will sort by both
yyyyMMdd & HHmmssSSS. If user searches for single day (for example today )
the I will apply sorting on HHmmssSSS field alone .

--

--

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEX_wjpbFT-efdyXGbaOpPyV-dAR%3Df%3Dw0%3DEy6c5C3nPjnPcn%2BA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAEX_wjpbFT-efdyXGbaOpPyV-dAR%3Df%3Dw0%3DEy6c5C3nPjnPcn%2BA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFm8D7TbapNxd%3DN1ZytcgxWsbDy0_Mn_buaZJQU78_JJw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Hi Jörg, We are dealing
with logs . If user debugs his code through logs I need apply sorting on
HHmmssSSS field . yyyyMMdd & HHmmssSSS both are int fields. How costly
applying sort on HHmmssSSS (80.6 million unique values) field? I am curious
to know whether lucene's packetInt plays a role here ? Or number of unique
values * 4 bytes ?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEX_wjryy3Ne1CDC20pWKUQaKUK%3Dz679BdyRyyo5EmOMmfsoHg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Hi ,

Sorry .Sending posts from phone caused these ugly replies.

Jörg , can you please look to this question , when you find time ?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/689336df-d59e-4d6c-b7fd-f0b9c0c31709%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

If you sort on a field with 80.6 mio unique values, ES will load these
values into RAM and sort on them.

"packedint" feature of Lucene is not important here, they are designed for
high frequency terms

Jörg

On Sat, Sep 27, 2014 at 4:34 AM, Anantha Govindarajan <
ananthagovindarajan@gmail.com> wrote:

Hi Jörg, We are dealing
with logs . If user debugs his code through logs I need apply sorting on
HHmmssSSS field . yyyyMMdd & HHmmssSSS both are int fields. How costly
applying sort on HHmmssSSS (80.6 million unique values) field? I am curious
to know whether lucene's packetInt plays a role here ? Or number of unique
values * 4 bytes ?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEX_wjryy3Ne1CDC20pWKUQaKUK%3Dz679BdyRyyo5EmOMmfsoHg%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAEX_wjryy3Ne1CDC20pWKUQaKUK%3Dz679BdyRyyo5EmOMmfsoHg%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFHjaPgKbEnuEfCSe4iZP_-ixjC6w0V%2BgZycjNdQ9yMqg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Hi Jörg,

Thanks for replying !

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/07ac663f-1848-47f1-b416-20b526891b95%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.