Date arithmetic behaviour oddity

Hi,

I have noticed some unexpected behaviour when using date arithment in
queries against a system which has a real-time feed of news content being
indexed. Around 10 items per minute are being indexed with a
'publication_time' reflecting the current time. The publication_time field
is mapped explicitly as a date with format "dateOptionalTime"

The objective is to retrieve any items with a publication_time within the
last 5 minutes.

I have created a brief gist to outline the problem as follows:

In

  • the double use of the 'now' keyword appears to result in a caching
    behaviour whereby the (correct) results of the first execution are returned
    for each subsequent request. Note I am not using a filter (which could be
    expected to cache by default) - the 'now' keywords are deliberately in the
    query.query string. Bouncing elasticsearch then gives a new set of
    (correct) results on first execution.

https://gist.github.com/maharg101/5220406#file-works-as-expected - a single
use of the 'now' keyword and an absolute date (in the future) results in
the latest results every time. No caching.

I am using 0.20.5. Any insights would be welcome, and if you need further
info from me please ask.

Loving elasticsearch btw, if you get a chance to attend their training, do
it !! :o)

cheers,
Graham

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Graham

I have noticed some unexpected behaviour when using date arithment in
queries against a system which has a real-time feed of news content
being indexed. Around 10 items per minute are being indexed with a
'publication_time' reflecting the current time. The publication_time
field is mapped explicitly as a date with format "dateOptionalTime"

You are correct. The query string is being cached and reused,
incorrectly.

A range query, however, calculates the new value for now() correctly.

See Date math in query_string caches now() · Issue #2808 · elastic/elasticsearch · GitHub

Also, it turns out that the validate API doesn't calculate date maths at
all, but if run before the query string, the query string will use the
cached query from the validate request

See Validate query ignores date maths · Issue #2809 · elastic/elasticsearch · GitHub

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thanks for the fast response Clint. It is notable that the combination of now and absolute date in a range within the query_string does appear to work as expected, as per https://gist.github.com/maharg101/5220406#file-works-as-expected - I know this as I've been joyously watching item headline facets 'from the last 5 minutes' bubbling up on a swanky ajax web page all day :smiley:

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

On Fri, 2013-03-22 at 10:48 -0700, Graham Lenton wrote:

Thanks for the fast response Clint. It is notable that the combination
of now and absolute date in a range within the query_string does
appear to work as expected, as per
elasticsearch date arithmetic behaviour oddity · GitHub - I
know this as I've been joyously watching item headline facets 'from
the last 5 minutes' bubbling up on a swanky ajax web page all day :smiley:

What's actually happening here is that the query string is being parsed
once, then cached, for faster execution if it hits any other shards on
the same node.

The query_string cache is small - 100 queries, so often by the time you
run it again, the cached version will have been discarded and work
fine.

Unfortunately, if it is the the only query being run, then you have to
run it a lot before it works :slight_smile:

We're looking into what to do

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.