Microsecond dates?


(Mark Hinkle) #1

Question regarding "date" type precision - is milliseconds the highest date
resolution? This post
(http://elasticsearch-users.115913.n3.nabble.com/Sorting-on-a-date-field-td1873287.html#none)
says "The date type in elasticsearch actually indexes a long value,
parsing the relevant date to a long. The resolution of the long value is
based on the date string passed." However, this gist
(https://gist.github.com/3665247) seems to indicate a millisecond limit -
even when specifically setting the date format with microseconds as shown,
the results are out of order and the long shown in the "sort" result is
short a few digits. I know that I can stuff the whole thing into a long as
microseconds since epoch or stuff seconds into an additional float property
and simply "sort":{"time":{"order":"desc"},"seconds":{"order":"desc"}} to
get microsecond date ordering in the results but I am hoping someone will
say "You're doing it wrong." Thanks in advance.

--


(phill) #2

I'm pretty sure the long in question is the long stored in a Java.util.Date

http://docs.oracle.com/javase/6/docs/api/java/util/Date.html
"The class |Date| represents a specific instant in time, with
millisecond precision."
"the number of milliseconds since January 1, 1970, 00:00:00 GMT
represented by this date." Just like in Unix systems, but in a 64 bit
long instead of a traditional int (32 bit).

One problem is that dates created when asking the OS for a millisecond
value at the moment often doesn't have millisecond precision since some
system clocks and the system call to get that value will only respond at
an interval larger than 1 millisecond.

The value formatted for output in the results is probably NOT the long
value used for sorting;
Hopefully the code uses the actual long value, but returns a friendly
string -- a string that happens to be missing a few digits.

If you really have a source of higher precision values you should store
them in something else, but I would use some combination of integers not
a float.

-Paul

On 9/7/2012 4:34 AM, Mark Hinkle wrote:

Question regarding "date" type precision - is milliseconds the highest
date resolution? This post
(http://elasticsearch-users.115913.n3.nabble.com/Sorting-on-a-date-field-td1873287.html#none)
says "The date type in elasticsearch actually indexes a long value,
parsing the relevant date to a long. The resolution of the long value
is based on the date string passed." However, this gist
(https://gist.github.com/3665247) seems to indicate a millisecond
limit - even when specifically setting the date format with
microseconds as shown, the results are out of order and the long shown
in the "sort" result is short a few digits. I know that I can stuff
the whole thing into a long as microseconds since epoch or stuff
seconds into an additional float property and simply
"sort":{"time":{"order":"desc"},"seconds":{"order":"desc"}} to get
microsecond date ordering in the results but I am hoping someone will
say "You're doing it wrong." Thanks in advance.

--


(Jörg Prante) #3

Hi Mark,

no matter what you do, when you index a date, in the end you store a long
in the Lucene index, representing the UTC date in millisecond resolution.
More info at
http://www.elasticsearch.org/guide/reference/mapping/core-types.html

If you want to use float, be aware of the explanations of FloatField at
http://lucene.apache.org/core/4_0_0-BETA/core/org/apache/lucene/document/FloatField.html

Hint: set precision_step to a very large value (Integer.MAX) if you only
sort the date values, but never do range queries. It will take less space
on heap while sorting.

Best regards,

Jörg

On Friday, September 7, 2012 1:34:27 PM UTC+2, Mark Hinkle wrote:

Question regarding "date" type precision - is milliseconds the highest
date resolution? This post (
http://elasticsearch-users.115913.n3.nabble.com/Sorting-on-a-date-field-td1873287.html#none)
says "The date type in elasticsearch actually indexes a long value,
parsing the relevant date to a long. The resolution of the long value is
based on the date string passed." However, this gist (
https://gist.github.com/3665247) seems to indicate a millisecond limit -
even when specifically setting the date format with microseconds as shown,
the results are out of order and the long shown in the "sort" result is
short a few digits. I know that I can stuff the whole thing into a long as
microseconds since epoch or stuff seconds into an additional float property
and simply "sort":{"time":{"order":"desc"},"seconds":{"order":"desc"}} to
get microsecond date ordering in the results but I am hoping someone will
say "You're doing it wrong." Thanks in advance.

--


(Mark Hinkle) #4

Thank you both for your replies. Disappointing but not unexpected
given the observable behavior. Since neither Shay's reply in the older
forum thread I mentioned nor the docs page mentioned by Jörg qualify
date precision to milliseconds, I had held out hope. :slight_smile:

One might question the usefulness or clarity of allowing a date format
pattern with more than 3 digits in precision when in reality those
extra digits will never be useable as part of the date/time. On the
other hand, in the (typical?) case where the time zone is at the end
of the date, I can see the necessity of getting past those "extra"
digits in an effort to reach the timezone.

I will take your warnings about the use of a float to heart and follow
Paul's suggestion of just using integer for the fractional seconds.

Thanks again.

--
Mark Hinkle

--


(phill) #5

On 9/7/2012 4:31 PM, Mark Hinkle wrote:

One might question the usefulness or clarity of allowing a date format
pattern with more than 3 digits in precision when in reality those
extra digits will never be useable as part of the date/time. On the
other hand, in the (typical?) case where the time zone is at the end
of the date, I can see the necessity of getting past those "extra"
digits in an effort to reach the timezone.

Again the behavior is inherited from Java library classes. The
SimpleDateFormat strings allow you to specify 1 or more capital "S"s.
1,2 or 3 "S"s can parse and generate milliseconds producing a value
without trailing zeros.

Curiously, simple date format given TOO MANY "S"s after the decimal
actually will parse the larger value after the dot as total millis (4
digits would mean whole seconds!) and add ALL OF THEM TO THE TIME
INCREASING THE LARGER VALUES like seconds and minute etc.

When formatting using a format with TOO MANY "S"s it will format the
value with leading zeros which does not look like millis, but will
reparse correctly.

I leave the following code for you on Friday night when I should already
be gone from the office. I have been using Java since version 0.8
(1995) and never knew what would happen if you put more than 3 "S"s
after the dot. Yes, I might question the usefulness too, but I wouldn't
bother to try to get Oracle to fix it now.
It ain't Shays's fault! :slight_smile:

By the way, do you really have better than millisecond precision?

-Paul

public class Foo {

 static DateFormat sdf1 = new SimpleDateFormat("yyyy-MM-dd G 'at' HH:mm:ss.S z");     // just one S

 static DateFormat sdf2 = new SimpleDateFormat("yyyy-MM-dd G 'at' HH:mm:ss.SSSS z");  // an extra "S"

 public static void main(String[] args) throws Exception {

     System.out.println( sdf1.format(sdf1.parse("2012-09-07 AD at 05:14:23.9 PDT") ));     //2012-09-07 AD at 05:14:23.9 PDT

     System.out.println( sdf1.format(sdf1.parse("2012-09-07 AD at 05:14:23.98 PDT") ));    //2012-09-07 AD at 05:14:23.98 PDT

     System.out.println( sdf1.format(sdf1.parse("2012-09-07 AD at 05:14:23.987 PDT") ));   //2012-09-07 AD at 05:14:23.987 PDT

     System.out.println( sdf1.format(sdf1.parse("2012-09-07 AD at 05:14:23.9876 PDT") ));  //2012-09-07 AD at 05:14:32.876 PDT  <-- added 9 seconds!

     

     System.out.println( sdf1.format(sdf2.parse("2012-09-07 AD at 05:14:23.9 PDT") ));      // 2012-09-07 AD at 05:14:23.9 PDT

     System.out.println( sdf1.format(sdf2.parse("2012-09-07 AD at 05:14:23.98 PDT") ));     // 2012-09-07 AD at 05:14:23.98 PDT

     System.out.println( sdf1.format(sdf2.parse("2012-09-07 AD at 05:14:23.987 PDT") ));    // 2012-09-07 AD at 05:14:23.987 PDT

     System.out.println( sdf1.format(sdf2.parse("2012-09-07 AD at 05:14:23.9876 PDT") ));   // 2012-09-07 AD at 05:14:32.876 PDT <-- added 9 seconds

     System.out.println( sdf1.format(sdf2.parse("2012-09-07 AD at 05:14:23.98765 PDT") ));  // 2012-09-07 AD at 05:16:01.765 PDT <-- added 98 seconds

     System.out.println( sdf2.format(sdf1.parse("2012-09-07 AD at 05:14:23.9 PDT") ));      // 2012-09-07 AD at 05:14:23.0009 PDT <-- formatted with misleading zeros

     System.out.println( sdf2.format(sdf1.parse("2012-09-07 AD at 05:14:23.98 PDT") ));     // 2012-09-07 AD at 05:14:23.0098 PDT <-- formatted with misleading zeros

     System.out.println( sdf2.format(sdf1.parse("2012-09-07 AD at 05:14:23.987 PDT") ));    // 2012-09-07 AD at 05:14:23.0987 PDT <-- formatted with misleading zeros

 }

--


(system) #6