Aggregations in 2.1.0 much slower than 1.6.0

No, that's not it I'm afraid. We run the same query over and over and yes, results come much faster after the first run, but still much slower than in 1.7.4. When I say it takes X seconds on 2.1.1 and Y seconds on 1.7.4 I always mean after we have ran it several times.

An interesting difference I am noticing is that when running the query on 2.1.1, it creates 1GB worth of fielddata and 0MB worth of filter cache. When I run the same query on 1.7.4 it creates 1.7GB worth of fielddata and 450MB worth of filter cache. So maybe something has changed in the code since 1.7?

Could you share your request and try to capture hot threads after fielddata has been loaded already to see where CPU goes in that case?

Unfortunately I've since torn down that cluster, but we tested by running our most common agg-heavy query hundreds of times against each configuration and came to same conclusions as @symos.

OK, how about this one:
https://dl.dropboxusercontent.com/u/23087609/hot_threads.zip

I've run the query and taken 10 "snapshots" of the hot threads every 1-2 seconds (the query takes around 17 seconds to finish). So this will give you a better idea of where the CPU goes.

Bear in mind the same query on version 1.x takes around 3.5 seconds.

I can also send you the request privately if you need it.

That would help thanks. Can you send it at adrien (at) elastic.co ?

Are you overriding the index.store.type setting by any chance? I'm surprised that it seems to use niofs while I would expect default_fs. I don't expect it to be the root cause of the problem, but it might contribute.

I may have found the reason: https://github.com/elastic/elasticsearch/pull/15998

1 Like

Good catch.

No, we're not overriding index.store.type, we just checked.

I also sent you the request via email. It might help confirm if the bug you mention is indeed what is causing the problem.

Hmm I could not find any email regarding this. Can you check that the email got actually sent?

Oh nevermind, I just found it.

Looking at the request and the hot threads, then https://github.com/elastic/elasticsearch/pull/15998 (which I already pasted above) should help resolve most of the slow down. This will be available in 2.2, which should be released in the coming weeks. If you still have performance issues when upgrading to 2.2 then I would be curious to get new hot threads to see what the new bottleneck is.

If you are willing to check as soon as possible, you could take a snapshot of your data and restore it in a nightly build of elasticsearch to compare performance. https://oss.sonatype.org/content/repositories/snapshots/org/elasticsearch/distribution/tar/elasticsearch/2.3.0-SNAPSHOT/elasticsearch-2.3.0-20160119.093021-82.tar.gz

That's good to hear, let's hope that this will indeed solve the issue!

As for testing, unfortunately we can't do it right now, since we already reverted to 1.7.4 for our live setup and we'll leave it there for now as we have to deal with other parts of the migration. Our new staging server is not even live yet, so it will be a while before our new setup is fully functional and we're able to test.

So right now it looks that we will wait for 2.2 to be released and we will upgrade our staging server first to test. I will report back if the issue persists.

Thanks very much for your help and I'm glad we helped identify a problem!

Please do! Thanks for helping track down this problem.