After upgrading from 2.4 to 5.1.2, we can see a tremendous increase of response times in simple document gets (from java's get API over transport protocol).
Previously -with 2.4-, the response times were in the range of few millisecs (1-5), now there are responses above 1-2 seconds, but only a few. The median seems to be OK.
The slow query doc says that there is no slow query logging for exact gets, just indexing and searches.
So what's the preferred method of finding out why these get queries are slow?
if you run the GET API, then no query is issued, so no slow query entry is being created. You can use the hot threads API to check what causes high CPU usage.
Also nodes stats APIs might be worth taking a look (and comparing against your old system).
due to the amount of data I had only a quick look. You are spending a lot of time in warmers. to create parent child data structures. Why is that? Could you use eager_global_ordinals instead?
Can I narrow this to at least an index, or mapping? We don't use parent child explicitly (I mean I don't know whether elasticsearch uses it internally somehow), although we've had a mapping where it was defined, but left unused.
So I can't answer this question. It would be good to see in what index/mapping does this happen.
Also, in ES2.4 we couldn't see this behaviour (I don't have a hot_threads dump, but the response times were quick). It started to appear with the upgrade to 5.1.2. Is there something, which has changed considerably in this area?
wait, something is weird here. You said you are using Elasticsearch 5.1.2, but warmers were removed in 5.1. Are you sure you dont have another version running... I'm confused now
That looks good. You also said that you dont use parent-child. Elasticsearch does not use this unless it is configured explicitely. You can use the _mapping endpoint to get the mapping of all indices.
so there are still warmers for some tasks, like the parent/child handling. Can you delete that parent child index (it seems to be empty anyway) and see if those warmers and their CPU usage go away (just a guess here)?
Also, is there constant CPU usage by the warmers? Like all the time? Have you run a couple of more hot threads requests?
Deeply wondering how any warmers can consume CPU, but you dont have any indices with parent child, that contain data
Sadly, we have other, valuable data in those indexes. But we will reindex those to see what changes.
BTW, the same was the case with ES2.4 and there the performance was OK. Do you know about any changes which could cause this?
I will report back with the findings, when the indexes got reindexed.
By reindexing all indexes, things looks better. The warmers' high CPU usage has disappeared, now hot_threads show only some merge tasks, or snapshot-related stuff (backups possibly).
Also, after the reindex, extremely long get responses are gone, now I can see this for today:
N Min Max Median Avg Stddev
x 89971255 0.255709 3056.4011 1.165522 2.5856993 17.446189
So it seems that parent-child mapping, which never had any doc caused some of the troubles.
Thanks,
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.