Single node with high cpu usage in large number of shards scenario

howardhuang · October 17, 2018, 3:45am

Hi All,

In my test environment, I setup a single ES 6.4.1 node, 1 core, 2GB memory (1GB heap).
After creating 600 empty indices (without any data and read/write operation), each index has 20 shards, total 12000 shards, es process cpu goes to 90+% every several seconds.

I just want to figure out what cause high cpu usage, since it’s an empty data cluster and without any operation (read/write). I tried to use jstack, hot_thread, but cann't find any clue, only see some active netty threads. Could anyone give some hint? Thanks a lot!

During testing, I disabled xpack features, and set some refresh, stats interval to a very long value to avoid impacting:

"cluster" : {
"routing" : {
"allocation" : {
"node_concurrent_recoveries" : "100"
}
},
"info" : {
"update" : {
"interval" : "1d"
}
}
},
"indices" : {
"recovery" : {
"max_bytes_per_sec" : "400mb"
}
}

"settings" : {
"index" : {
"refresh_interval" : "30m",
"number_of_shards" : "20",
"translog" : {
"sync_interval" : "1h",
"durability" : "async"
},
"provided_name" : "indexi0",
"max_result_window" : "65536",
"creation_date" : "1539742381743",
"store" : {
"stats_refresh_interval" : "1h"
},
"number_of_replicas" : "0",
"uuid" : "GxZ_ADUzQo6mepQREJe5nA",
"version" : {
"created" : "6040199"
}
}
}

I also modfied es code and disabled translog async and globle checkpoint scheduler. I cloud not see any frequent logs in trace level now. But cpu still goes high in every several seconds.

Thanks,
Howard

spinscale · October 17, 2018, 7:10am

What was the reason you decided to go with thousands of shards?

This is way too many shards. Try to stick with as few shards as possible. Even one shard will be able to utilize your system resources.

howardhuang · October 17, 2018, 9:15am

Hi Alex, thanks for the reply.

As in our product env, we have some small resource clusters with hundreds of shards scenario, we could see cpu usage very high every several seconds. I just scale out the shard number in test env and try to figure out the reason, try to find out which part of jobs used high cpu. But I cannot find any clue, since jstack, hot_thread cannot supply any useful information.

So if I just create some indices with empty shard, no data read write, which would be the most costly part? Shard level statistic? Could you please guide me some information? Thanks a lot.

Thanks,

Howard

spinscale · October 17, 2018, 10:46am

each shard comes with its own requirements of memory/CPU/diskspace even when not a lot of data is indexed, thus you can just end up with memory pressure resulting in garbage collections, especially on smaller machines.

You should really try to reduce the number of shards, and definitaly read the Designing for scale in the definitive guide, this one in particular https://www.elastic.co/guide/en/elasticsearch/guide/master/kagillion-shards.html

hope this helps!

--Alex

howardhuang · October 17, 2018, 2:04pm

Thank you Alex! I finally found it's jvm gc cost. After using 1core 8GB memory(6GB heap), almost no cpu usage.

Thanks,
Howard

system · November 14, 2018, 2:13pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
High cpu usage when ES loads more and more shards Elasticsearch	1	586	August 6, 2018
High CPU Usage, not from queries Elasticsearch	1	424	April 20, 2017
Node experiencing relatively high CPU usage Elasticsearch	27	4054	July 6, 2017
High CPU usage on only 1 Data node Elasticsearch	7	920	October 16, 2020
CPU Utilisation going high at 99% Elasticsearch	1	557	February 9, 2017

Single node with high cpu usage in large number of shards scenario

Related topics