I have a query, where part of it is a terms-aggregation.
The query takes <500ms normally, but after inserts/updates to the index, the query takes
~2-3000ms the first couple of times .
If I remove the aggregation part of the query, I do not see the same performance problems after inserts/updates.
I am running on elastic 1.7, but can see people experience the same problems on other versions:
The solution used in one of the threads, using filter-aggregation, is not possible for me, since the scoring is key for the query.
The threads I can find are from 2016 and 2017, so maybe someone has experienced it in the meantime, and found an explanation/solution?
What kind of hardware do you have? Are you using doc_values?
Can you upgrade? We are now on 7.2 and so many things happened in the last 3 or 4 years... Including in the JVM itself.
Sadly I can not upgrade at the current time, even though it is one of my biggest wishes
We are using doc_values yes, as far as I understand aggregation is not possible without using doc_values?
We are running on instances with 4 virtual cores, 16GB ram with 7GB allocated to the JVM and 2TB discs
So the reason for it being slow, after data updates, should be that a cache is flushed, and thus it has to fill it again from disc, which is then slow because its over network?
When you update/add data, you are writing new segments on disk. Also if segment merges needs to happen, more data then have to be read on disk.
Which means that new search needs to read again new data from disk.
My problem was not that the disk wasn't local.
I tried to change to local disks, without any improvements.
The problem however seemed to be, that the field I do my aggregation on, has a very high cardinality (>1million).
Thus the global ordinals where taking a long time to recompute after data changes.
As default global ordinals are lazy loaded - that is on first search after changes.
By changing it to be eager-loaded, I pay on insert/refresh-time instead of search-time.
In my case this is a fine solution, since there are no requirements to the time for inserts/updates.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.