According to this thread there's isn't anything I can do to avoid this behavior and it seems that the newest ES version has older Lucene version (8.5) which doesn't have the fixed issue (since it's very recent).
The field is mapped as 'date', ES version: 7.2.1.
Edit: The daily index in question has 6 shards and the total avg size is ~260GB and has ~600M documents. The timestamp in the event is in seconds.
One more thing I found out is that two specific nodes (all nodes have 100% same settings and hardware resources) are showing much slower responses if they have a shard involved:
In order to confirm the issue, could you run the query and provide the output of the hot threads while the query is running? It will hopefully tell us where the query is spending most of the time:
Yes I had a look at it looks like hot threads indicate that the process is spending its time in caching the query. I am a bit surprised about the timings, this happens every time you run the query? In theory you need to run a few times (typically 5 times) for the query to be cached? does it run quickly when it is not cached?
I am a bit surprised about the timings, this happens every time you run the query?
Yes, that's why I'm trying to find out the core issue.
In theory you need to run a few times (typically 5 times) for the query to be cached?
Sorry, I'm not sure what you're referring to. I can't utilize ES cache since there are too many requests for different indices and with different bodies. There are terabytes of data indexed that obviously won't fit in any memory. But the issue isn't with caching here but the fact that the profiler shows that it spends a tremendous amount of time checking if the "@timestamp" field exists.
does it run quickly when it is not cached?
I still don't know how to respond to that. No, it's painfully slow but caching isn't the question here. The query I ran is an example, there are instances where it finishes over 80 seconds but in the profiler, I chose a speedier one to find out what's happening.
I'm not surprised by timings and behaviour: this seems to match our experience.
The cache should eventually help though, even if the queries are not all identical, unless you indeed have too little memory for the cache to keep the DocValuesFieldExistsQuery at all.
e.g. if you run as query just"exists":{"field":"@timestamp"} on all relevant indices continuously until it returns instantaneously, then any other query relying on that result (in particular range filters) will no longer be bogged down by this.
Granted this is not really satisfactory since in a real-life usage the building of the cache won't necessarily happen very well... That's why I'm very curious whether we will see an improvement in both our use-cases once Lucene 8.6/ES 7.x is finally released.
@Ignacio_Vera could you please confirm that the recently released ES version (7.8.0) is indeed using the same Lucene version (8.5.1) as the previous one (7.7.0)?
Hey @YvorL, nope, sadly Lucene hasn't tagged 8.6.0 yet (it's planned to happen in a couple of weeks or so, though), so ES still hasn't picked up that fix. Hopefully for 7.9.0 (so in 1-2 months?)?
EDIT: Sorry, so the answer to your question is 'yes', not 'no', technically...
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.