@timestamp field slow range query "DocValuesFieldExistsQuery"

YvorL · May 29, 2020, 11:19pm

Hi!

I was profiling my queries and I run into this:

According to this thread there's isn't anything I can do to avoid this behavior and it seems that the newest ES version has older Lucene version (8.5) which doesn't have the fixed issue (since it's very recent).

The field is mapped as 'date', ES version: 7.2.1.

Edit: The daily index in question has 6 shards and the total avg size is ~260GB and has ~600M documents. The timestamp in the event is in seconds.

Thanks!

YvorL · May 30, 2020, 12:18am

One more thing I found out is that two specific nodes (all nodes have 100% same settings and hardware resources) are showing much slower responses if they have a shard involved:

What could be the issue there?

Ignacio_Vera · June 1, 2020, 10:21am

Hi,

In order to confirm the issue, could you run the query and provide the output of the hot threads while the query is running? It will hopefully tell us where the query is spending most of the time:

https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-nodes-hot-threads.html

YvorL · June 2, 2020, 11:00am

@Ignacio_Vera, the output is too long to post here.

Ignacio_Vera · June 2, 2020, 11:09am

You can add it to a GIST and post the link here?

YvorL · June 2, 2020, 12:05pm

Yep, here it is.

YvorL · June 8, 2020, 7:50am

@Ignacio_Vera, did you have any free time to spare on my question?

Ignacio_Vera · June 8, 2020, 8:25am

Hi @YvorL,

Yes I had a look at it looks like hot threads indicate that the process is spending its time in caching the query. I am a bit surprised about the timings, this happens every time you run the query? In theory you need to run a few times (typically 5 times) for the query to be cached? does it run quickly when it is not cached?

YvorL · June 8, 2020, 8:58am

I am a bit surprised about the timings, this happens every time you run the query?

Yes, that's why I'm trying to find out the core issue.

In theory you need to run a few times (typically 5 times) for the query to be cached?

Sorry, I'm not sure what you're referring to. I can't utilize ES cache since there are too many requests for different indices and with different bodies. There are terabytes of data indexed that obviously won't fit in any memory. But the issue isn't with caching here but the fact that the profiler shows that it spends a tremendous amount of time checking if the "@timestamp" field exists.

does it run quickly when it is not cached?

I still don't know how to respond to that. No, it's painfully slow but caching isn't the question here. The query I ran is an example, there are instances where it finishes over 80 seconds but in the profiler, I chose a speedier one to find out what's happening.

Thank you, for looking into this!

itizir · June 8, 2020, 10:38am

I'm not surprised by timings and behaviour: this seems to match our experience.

The cache should eventually help though, even if the queries are not all identical, unless you indeed have too little memory for the cache to keep the DocValuesFieldExistsQuery at all.
e.g. if you run as query just "exists":{"field":"@timestamp"} on all relevant indices continuously until it returns instantaneously, then any other query relying on that result (in particular range filters) will no longer be bogged down by this.

Granted this is not really satisfactory since in a real-life usage the building of the cache won't necessarily happen very well... That's why I'm very curious whether we will see an improvement in both our use-cases once Lucene 8.6/ES 7.x is finally released.

YvorL · June 22, 2020, 1:21pm

@Ignacio_Vera could you please confirm that the recently released ES version (7.8.0) is indeed using the same Lucene version (8.5.1) as the previous one (7.7.0)?

Thank you!

itizir · June 22, 2020, 1:59pm

Hey @YvorL, nope, sadly Lucene hasn't tagged 8.6.0 yet (it's planned to happen in a couple of weeks or so, though), so ES still hasn't picked up that fix. Hopefully for 7.9.0 (so in 1-2 months?)?

EDIT: Sorry, so the answer to your question is 'yes', not 'no', technically...

YvorL · June 22, 2020, 2:27pm

Thanks!

system · July 20, 2020, 2:27pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Query intermittent performance issue Elasticsearch	2	144	April 30, 2024
Performance querying time-based indices in a date range Elasticsearch	3	2374	August 3, 2020
Fielddata cache disabling when doc_values on Elasticsearch	1	490	July 5, 2017
Slow aggregation queries, only after data change (ES 2.3) Elasticsearch	9	1321	December 26, 2016
Query performance Elasticsearch	1	301	July 6, 2017

@timestamp field slow range query "DocValuesFieldExistsQuery"

Related topics