After cluster upgraded from 6.8.4 to 7.5.0.
the performance on count api in 7.5 is much slower than in 6.8.4
in Kibana run this
GET /myindex or alias/_count
in 6.8.4 all count API response time is under sub second ( no performance downgrade with/without indexing process running)
in 7.5 the response time is over 1 second ( some time is over 20 seconds when indexing process is running on the index )
I'm seeing the same issue! Seems like the fastest way now might be to use _cat/indices and add up the values yourself. It might not be as accurate though and is still slower than older versions of elasticsearch.
Our old way with _count
time curl -H "Content-Type: application/json" -s --insecure https://localhost:9200/_count
{"count":182116470792,"_shards":{"total":20107,"successful":20107,"skipped":0,"failed":0}}
real 0m6.795s
_cat/count is the same
time curl -H "Content-Type: application/json" -s --insecure 'https://localhost:9200/_cat/count?format=json'
[{"epoch":"1575661460","timestamp":"19:44:20","count":"182117127449"}]
real 0m6.523s
vs /_cat/indices is faster
time curl -H "Content-Type: application/json" -s --insecure 'https://localhost:9200/_cat/indices?h=index,docs.count&format=json' > /dev/null
real 0m2.351s
@Andy_Wick Thanks for the follow-up. The interesting point is that COUNT API in version 6.* and before is much faster ( same as using _cat/Indexing ) . I have not seen any release notes talking about Count API changes in version 7* .
I wonder if this is a side-effect of the notion of "search idle" added in 7.x. In 6.x and before, we refresh automatically in the background by default very second so requests always the use the available searcher while in 7.x a request made on a "search idle" shard will be parked until the refresh is done. Would you be able to provide us with the outputs of the (hot_threads)[Nodes hot threads API | Elasticsearch Guide [8.11] | Elastic) API while the slow query is running ? This behavior should only affect the first request that hits a search-idle shard configured with the default refresh_ratio. You can also opt-out from this behavior in 7.x by setting an explicit index.refresh_interval.
Seems like the fastest way now might be to use _cat/indices and add up the values yourself.
This API does not use _search to retrieve the docs.count so it is expected to be faster. It uses the index statistics that are exposed per reader instance and more importantly does not check if a refresh is needed or not.
So I should have mentioned that these are time based indices. So I'll switch to indices that are NOT being written to and see the same issue, and even manually call refresh to them. (sessions2-190* will match Jan-Sept)
time curl -H "Content-Type: application/json" -s --insecure 'https://localhost:9200/sessions2-18*,sessions2-190*/_refresh'
{"_shards":{"total":38075,"successful":38068,"failed":0}}
real 0m5.950s
time curl -H "Content-Type: application/json" -s --insecure 'https://localhost:9200/sessions2-18*,sessions2-190*/_count'
{"count":148953359394,"_shards":{"total":19034,"successful":19034,"skipped":0,"failed":0}}
real 0m5.704s
time curl -H "Content-Type: application/json" -s --insecure 'https://localhost:9200/sessions2-18*/_count'
{"count":34383716799,"_shards":{"total":7226,"successful":7226,"skipped":0,"failed":0}}
real 0m0.806s
time curl -H "Content-Type: application/json" -s --insecure 'https://localhost:9200/sessions2-190*/_count'
{"count":114569642595,"_shards":{"total":11808,"successful":11808,"skipped":0,"failed":0}}
real 0m2.669s
You know what this just pointed out is that _count is taking about the same time as _refresh. _count isn't calling refresh when it doesn't need to is it? These indices haven't been written to for months now.
Sure I can send hot threads privately. Where should I send?
What about my question where if I just do sessions2-18 its 0.8 second, sessions2-190 its 2.6 seconds, but both is 5.7s? That really seems like the number of shards/indices is causing issue or bug?
What about my question where if I just do sessions2-18 its 0.8 second, sessions2-190 its 2.6 seconds, but both is 5.7s? That really seems like the number of shards/indices is causing issue or bug?
That's a lot of shards and the performance will greatly depend on the number of nodes that you have in your cluster. Do you really need that number of shards ? I am not aware of any change in 7.x that would affect index patterns but this looks like a big number considering the total number of documents involved. What is your policy to create new shards ?
Yes it is a decent sized cluster, 69 nodes, over a PB of data. We used to do 4 indices a day, now doing 1 and slowly shrinking down the old indices. Whenever I see non linear growth in execution time (0.8 + 2.6 should mean both together should take ~4s not ~6s) I usually suspect some kind of Qing issue or a loop inside of a loop that shouldn't be.
This API is slower than it used to be, if the answer is WAD, then I'll let it go, I've already switched to the work around.
That's hard to say to be honest. You have a slowdown in 7.x that is unexplained at the moment but I mentioned the number of shards because that's an actionable item that should speed up your queries. I also wonder if the slowdown you're seeing are inlined with what @jihua.zhong describes in the initial post.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.