I have noticed some unexpected behaviour when using date arithment in
queries against a system which has a real-time feed of news content being
indexed. Around 10 items per minute are being indexed with a
'publication_time' reflecting the current time. The publication_time field
is mapped explicitly as a date with format "dateOptionalTime"
The objective is to retrieve any items with a publication_time within the
last 5 minutes.
I have created a brief gist to outline the problem as follows:
In
the double use of the 'now' keyword appears to result in a caching
behaviour whereby the (correct) results of the first execution are returned
for each subsequent request. Note I am not using a filter (which could be
expected to cache by default) - the 'now' keywords are deliberately in the
query.query string. Bouncing elasticsearch then gives a new set of
(correct) results on first execution.
I have noticed some unexpected behaviour when using date arithment in
queries against a system which has a real-time feed of news content
being indexed. Around 10 items per minute are being indexed with a
'publication_time' reflecting the current time. The publication_time
field is mapped explicitly as a date with format "dateOptionalTime"
You are correct. The query string is being cached and reused,
incorrectly.
A range query, however, calculates the new value for now() correctly.
Also, it turns out that the validate API doesn't calculate date maths at
all, but if run before the query string, the query string will use the
cached query from the validate request
Thanks for the fast response Clint. It is notable that the combination of now and absolute date in a range within the query_string does appear to work as expected, as per https://gist.github.com/maharg101/5220406#file-works-as-expected - I know this as I've been joyously watching item headline facets 'from the last 5 minutes' bubbling up on a swanky ajax web page all day
On Fri, 2013-03-22 at 10:48 -0700, Graham Lenton wrote:
Thanks for the fast response Clint. It is notable that the combination
of now and absolute date in a range within the query_string does
appear to work as expected, as per elasticsearch date arithmetic behaviour oddity · GitHub - I
know this as I've been joyously watching item headline facets 'from
the last 5 minutes' bubbling up on a swanky ajax web page all day
What's actually happening here is that the query string is being parsed
once, then cached, for faster execution if it hits any other shards on
the same node.
The query_string cache is small - 100 queries, so often by the time you
run it again, the cached version will have been discarded and work
fine.
Unfortunately, if it is the the only query being run, then you have to
run it a lot before it works
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.