No, that's not it I'm afraid. We run the same query over and over and yes, results come much faster after the first run, but still much slower than in 1.7.4. When I say it takes X seconds on 2.1.1 and Y seconds on 1.7.4 I always mean after we have ran it several times.
An interesting difference I am noticing is that when running the query on 2.1.1, it creates 1GB worth of fielddata and 0MB worth of filter cache. When I run the same query on 1.7.4 it creates 1.7GB worth of fielddata and 450MB worth of filter cache. So maybe something has changed in the code since 1.7?
Unfortunately I've since torn down that cluster, but we tested by running our most common agg-heavy query hundreds of times against each configuration and came to same conclusions as @symos.
I've run the query and taken 10 "snapshots" of the hot threads every 1-2 seconds (the query takes around 17 seconds to finish). So this will give you a better idea of where the CPU goes.
Bear in mind the same query on version 1.x takes around 3.5 seconds.
I can also send you the request privately if you need it.
Are you overriding the index.store.type setting by any chance? I'm surprised that it seems to use niofs while I would expect default_fs. I don't expect it to be the root cause of the problem, but it might contribute.
Looking at the request and the hot threads, then https://github.com/elastic/elasticsearch/pull/15998 (which I already pasted above) should help resolve most of the slow down. This will be available in 2.2, which should be released in the coming weeks. If you still have performance issues when upgrading to 2.2 then I would be curious to get new hot threads to see what the new bottleneck is.
That's good to hear, let's hope that this will indeed solve the issue!
As for testing, unfortunately we can't do it right now, since we already reverted to 1.7.4 for our live setup and we'll leave it there for now as we have to deal with other parts of the migration. Our new staging server is not even live yet, so it will be a while before our new setup is fully functional and we're able to test.
So right now it looks that we will wait for 2.2 to be released and we will upgrade our staging server first to test. I will report back if the issue persists.
Thanks very much for your help and I'm glad we helped identify a problem!
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.