Sorting on a date field

I need to query docs, having results sorted by a date field.

Before using Elastic Search, I cast my date (long of millis since epoc) to a float and then sorted on that. I did the cast because I read that Lucene could not sort on a long. There was a loss of precision, obviously, and if the docs were too close in time, the ordering might be off, which we tolerated. When we switched to Elastic Search, we did the same thing, doing a script on the timestamp saved as a float; again it worked with tolerable some loss of precision.

Now I want to improve the precision. In a different thread you advised me to to save the date as a long and the do a script on that. I find it also generally works but also lacks precision when the dates get too close (generally under 3 minutes or so apart). This is what I did (sortField referenced a long):

response = indexClient.search(Requests.searchRequest(getIndexName()).types(documentTypes).searchType(SearchType.QUERY_THEN_FETCH).source(SearchSourceBuilder.searchSource().query(QueryBuilders.customScoreQuery(QueryBuilders.queryString(query)).script("doc['" + sortField + "'].value")).fields(fields).from(offset).size(max).explain(true))).actionGet();

I also tried storing the field as a string and doing a sort on that. It worked and the precision was better, but I could not get the sort order param to work -- I get the same results whether I user SortOrder.ASC or DESC. This is what I did (sortField referenced a string):

response = indexClient.search(Requests.searchRequest(getIndexName()).types(documentTypes).searchType(SearchType.DFS_QUERY_THEN_FETCH).source(SearchSourceBuilder.searchSource().query(QueryBuilders.queryString(query)).fields(fields).from(offset).size(max).explain(true).sort(sortField, SortOrder.DESC))).actionGet();

  1. Is loss of precision doing a script on a long expected? Is there anything I can do to improve precision?

  2. If I wind up doing a sort on a string, how can I get the sort param to work?

  3. I understand a sort is slower than a script; how much worse is this expected to be?

BTW #1: I know a script will change the score and a sort will not. I'm not too worried about that.
BTW #2: I need to use the from/size params for pagination; not sure if that impacts this decision.

Thanks.

Perhaps a little more context might help:

Before I tolerated the loss of precision because my search was not paginated. I just returned the top N docs, starting at 0. I could tolerate docs a little out of order or do my own second sort on the result set to get it perfect.

Now I am paginating. So, if I got 30 docs close in time, and my page size is 10, my pages will be wierd; the same doc can show up on pages 1, 2, and 3, and some docs might now show up on any.

Thanks again.

Hey John,
Maybe I am missing some context, as you refer to a previous thread,
but I'm sorting on date fields with no problem and they appear to have
a 1 second granularity.

Thanks,
Paul

On Nov 9, 5:52 pm, John Chang jchangkihte...@gmail.com wrote:

Perhaps a little more context might help:

Before I tolerated the loss of precision because my search was not
paginated. I just returned the top N docs, starting at 0. I could tolerate
docs a little out of order or do my own second sort on the result set to get
it perfect.

Now I am paginating. So, if I got 30 docs close in time, and my page size
is 10, my pages will be wierd; the same doc can show up on pages 1, 2, and
3, and some docs might now show up on any.

Thanks again.

View this message in context:http://elasticsearch-users.115913.n3.nabble.com/Sorting-on-a-date-fie...
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

Hi,

There is no problem to sort on long (in Lucene or Elasticsearch). You
can simply index a long value up to 1 millisecond resolution (for example),
and then sort on it. The date type in elasticsearch actually indexes a
long value, parsing the relevant date to a long. The resolution of the long
value is based on the date string passed.

When you do sort, don't use scripting based sorting for that, just add
the field to be sorted, its type will be autodetected and the proper sorting
will be done.

-shay.banon

On Wed, Nov 10, 2010 at 8:56 AM, Paul ppearcy@gmail.com wrote:

Hey John,
Maybe I am missing some context, as you refer to a previous thread,
but I'm sorting on date fields with no problem and they appear to have
a 1 second granularity.

Thanks,
Paul

On Nov 9, 5:52 pm, John Chang jchangkihte...@gmail.com wrote:

Perhaps a little more context might help:

Before I tolerated the loss of precision because my search was not
paginated. I just returned the top N docs, starting at 0. I could
tolerate
docs a little out of order or do my own second sort on the result set to
get
it perfect.

Now I am paginating. So, if I got 30 docs close in time, and my page
size
is 10, my pages will be wierd; the same doc can show up on pages 1, 2,
and
3, and some docs might now show up on any.

Thanks again.

View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Sorting-on-a-date-fie...
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

I'm afraid I must not be understanding your advice. I guess I need more specifics about how you want me to (A) map, (B) index and (C) search the doc.

I read your response and then....

I tried mapping the field this way:
"receivedDate" : {
"index_name": "receivedDate",
"type": "date",
"index": "analyzed",
"store": "yes",
"term_vector": "no",
"boost": 1.0,
"omit_norms": "false",
"omit_term_freq_and_positions": "false"
}

I then tried these combinations (data.getReceivedDateInUTC() returns a java.util.Date):

Map<String, Object> document = new HashMap<String, Object>();
document.put("receivedDate", data.getReceivedDateInUTC());

When I searched on the doc with a script:
.script("doc['" + sortField + "'].value")
it worked, but lacked precision with dates close in time (under ~ 3 min).

1.1)
I indexed the doc as in 1 above, and I searched on the doc with a sort:
.sort(sortField)
I got SearchPhaseExecutionException

Map<String, Object> document = new HashMap<String, Object>();
document.put("receivedDate", data.getReceivedDateInUTC().getTime());

I could not index the doc; I got:
java.lang.IllegalArgumentException: Invalid format: "1204351200000" is malformed at "0000"

Didn't expect this to work, but was taking a guess because nothing else did.

Map<String, Object> document = new HashMap<String, Object>();
document.put("receivedDate", Long.toString(data.getReceivedDateInUTC().getTime()));

I could not index the doc; I got:
java.lang.IllegalArgumentException: Invalid format: "1204351200000" is malformed at "0000"
(same as #2 above)

Your response mentioned a date string: "The resolution of the long value is based on the date string passed." Hence this test.

Map<String, Object> document = new HashMap<String, Object>();
document.put("receivedDate", data.getReceivedDateInUTC().toString());

I could not index the doc; I got:
org.elasticsearch.index.mapper.MapperParsingException: Failed to parse [receivedDate]

As in 3 above, this was a test based on my understanding of a date string from your response.

Thanks for your help.

You have two options, first is to have a numeric (long) type for the field,
second is to have a date type, which expects (by default) an ISO formatted
date string.

If you use the Java API, you have two options:

  1. If you want the field to be numeric, then just add the "milliseconds
    since epoch" value (Date.getTime()) to your Map of values.
  2. If you want the field to be a date type, then either provide the
    formatted string yourself, or, pass a Date instance as the value of the
    field, it will automatically be formatted to its ISO format and indexed.

In both options, you don't need to explicitly set the mappings. It will be
auto detected. I say, since you already have a Date object, to index it
(which will, in turn, be formatted as ISO formatted string, and detected as
such in the date type).

Once you have index it, you will be able to sort it by just adding it as
sort field in the search API.

Regarding the resolution, I am not sure I understand what you mean by
missing resolution, but make sure the data you index does actually offer the
resolution you expected (the Date instance you get, does it have that
resolution?).

-shay.banon

On Thu, Nov 11, 2010 at 12:27 AM, John Chang jchangkihtest2@gmail.comwrote:

I'm afraid I must not be understanding your advice. I guess I need more
specifics about how you want me to (A) map, (B) index and (C) search the
doc.

I read your response and then....

I tried mapping the field this way:
"receivedDate" : {
"index_name": "receivedDate",
"type": "date",
"index": "analyzed",
"store": "yes",
"term_vector": "no",
"boost": 1.0,
"omit_norms": "false",
"omit_term_freq_and_positions": "false"
}

I then tried these combinations (data.getReceivedDateInUTC() returns a
java.util.Date):

Map<String, Object> document = new HashMap<String, Object>();
document.put("receivedDate", data.getReceivedDateInUTC());

When I searched on the doc with a script:
.script("doc['" + sortField + "'].value")
it worked, but lacked precision with dates close in time (under ~ 3 min).

1.1)
I indexed the doc as in 1 above, and I searched on the doc with a sort:
.sort(sortField)
I got SearchPhaseExecutionException

Map<String, Object> document = new HashMap<String, Object>();
document.put("receivedDate", data.getReceivedDateInUTC().getTime());

I could not index the doc; I got:
java.lang.IllegalArgumentException: Invalid format: "1204351200000" is
malformed at "0000"

Didn't expect this to work, but was taking a guess because nothing else
did.

Map<String, Object> document = new HashMap<String, Object>();
document.put("receivedDate",
Long.toString(data.getReceivedDateInUTC().getTime()));

I could not index the doc; I got:
java.lang.IllegalArgumentException: Invalid format: "1204351200000" is
malformed at "0000"
(same as #2 above)

Your response mentioned a date string: "The resolution of the long value is
based on the date string passed." Hence this test.

Map<String, Object> document = new HashMap<String, Object>();
document.put("receivedDate", data.getReceivedDateInUTC().toString());

I could not index the doc; I got:
org.elasticsearch.index.mapper.MapperParsingException: Failed to parse
[receivedDate]

As in 3 above, this was a test based on my understanding of a date string
from your response.

Thanks for your help.

View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Sorting-on-a-date-field-tp1873287p1879190.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

Thanks! I got my test working, mapping the field as a date. My problem before is that I was using a QueryBuilders.customScoreQuery with a sort, which produced the error. I got rid of the customScoreQuery part and the sort worked great.

Unfortunately, my production dates are not mapped as dates; I foolishly mapped the long values as type string. Until I can get to reindexing the dates as date fields, is there any way I can get the sorting to work with precision having the longs indexed as strings? I'll need to be able to sort them both ascending and descending. The string value is mapped as index=no, store=yes in case that makes a difference.

We don't have a way of reindexing quickly here - we are working on that.

Thanks again!