No. But I would suggest that mapping from a long to a int would be
obviously more performant.
On 25 Sep 2014 15:16, "Bleh" anant.aneja@gmail.com wrote:
Did you run any experiments comparing sorting on dates vs the custom
scoring
suggestion you made ?
Initially we too used System.currentTimeInMillis() . Then we switched to 2
int fields something like yyyyMMdd & HHmmssSSS.
If query's time criteria falls with in single date then we dont apply yyyyMMdd
field for sorting. We have decent performance compared with
System.currentTimeInMillis().
Hi Jörg,
How much memory will es take when applying aggregation on a long field
which contains ~80 million (1000 * 60 * 60 * 24 unique millis in a day =
86400000 ) unique long values.
If i understand correctly 80 million * 8 bytes for long . i.e., 86400000 *
8 = 691200000 (659 MB). Incase of yyyyMMdd as int field , 86400000 * 4
= 345600000 (329 MB).
What is the role of lucene's packedInt here in this case? Sorry if i
missing something.
yyyMMdd would map all values of a day to a single integer and you get
something like "sort by day" or "filter by day".
Assumed you have a normal distribution and you have a year of timestamps,
you can estimate: 80mio / 365 = 219.178 timestamps per day. In the "day
field", you have only 365 integers in the cache instead of 80mio longs for
unique millis. If "day" is too coarse, you can add an hour, minute, second
index.
Initially we too used System.currentTimeInMillis() . Then we switched to 2
int fields something like yyyyMMdd & HHmmssSSS.
If query's time criteria falls with in single date then we dont apply yyyyMMdd
field for sorting. We have decent performance compared with
System.currentTimeInMillis().
Hi Jörg,
How much memory will es take when applying aggregation on a long field
which contains ~80 million (1000 * 60 * 60 * 24 unique millis in a day =
86400000 ) unique long values.
If i understand correctly 80 million * 8 bytes for long . i.e., 86400000 *
8 = 691200000 (659 MB). Incase of yyyyMMdd as int field , 86400000 * 4
= 345600000 (329 MB).
What is the role of lucene's packedInt here in this case? Sorry if i
missing something.
Hi , Sorry I intended to say HHmmssSSS field . When I apply sorting or aggregations on HHmmssSSS field how much memory will it take ? In this case number of unique values for HHmmssSSS field can be 8640000(~80.6 million) . FYI: We are maintaining daily indexes . when user trying to search across days (for example last 7 days) , I will sort by both yyyyMMdd & HHmmssSSS. If user searches for single day alone ( for example today) , I will sort by only HHmmssSSS field alone )
Hi Jörg, Sorry
I intended to say for HHmmssSSS field . How much memory will es take when I
apply sorting or aggregations on HHmmssSSS field . In this case number of
unique values for HHmmssSSS field can be 86400000(~80.6 million.
Note : We are creating daily indexes . If user searches
on multiple dates (for example last 7 days ) , then I will sort by both
yyyyMMdd & HHmmssSSS. If user searches for single day (for example today )
the I will apply sorting on HHmmssSSS field alone .
Hi Jörg, Sorry
I intended to say for HHmmssSSS field . How much memory will es take when I
apply sorting or aggregations on HHmmssSSS field . In this case number of
unique values for HHmmssSSS field can be 86400000(~80.6 million.
Note : We are creating daily indexes . If user searches
on multiple dates (for example last 7 days ) , then I will sort by both
yyyyMMdd & HHmmssSSS. If user searches for single day (for example today )
the I will apply sorting on HHmmssSSS field alone .
Hi Jörg, We are dealing
with logs . If user debugs his code through logs I need apply sorting on
HHmmssSSS field . yyyyMMdd & HHmmssSSS both are int fields. How costly
applying sort on HHmmssSSS (80.6 million unique values) field? I am curious
to know whether lucene's packetInt plays a role here ? Or number of unique
values * 4 bytes ?
Hi Jörg, We are dealing
with logs . If user debugs his code through logs I need apply sorting on
HHmmssSSS field . yyyyMMdd & HHmmssSSS both are int fields. How costly
applying sort on HHmmssSSS (80.6 million unique values) field? I am curious
to know whether lucene's packetInt plays a role here ? Or number of unique
values * 4 bytes ?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.