CPU at 99% - Need help understanding hot_threads ! pls


(Camilo Sierra) #1

85.2% (426.1ms out of 500ms) cpu usage by thread 'elasticsearch[Data 1-hot][bulk][T#1]'
3/10 snapshots sharing following 19 elements
org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:401)
org.elasticsearch.index.mapper.object.ObjectMapper.serializeValue(ObjectMapper.java:706)
org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:497)
org.elasticsearch.index.mapper.object.ObjectMapper.serializeObject(ObjectMapper.java:554)
org.elasticsearch.index.mapper.object.ObjectMapper.serializeNonDynamicArray(ObjectMapper.java:685)
org.elasticsearch.index.mapper.object.ObjectMapper.serializeArray(ObjectMapper.java:604)
org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:489)
org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:544)
org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:493)
org.elasticsearch.index.shard.IndexShard.prepareIndex(IndexShard.java:493)
org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:409)
org.elasticsearch.action.bulk.TransportShardBulkAction.shardUpdateOperation(TransportShardBulkAction.java:515)
org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:232)
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$PrimaryPhase.performOnPrimary(TransportShardReplicationOperationAction.java:574)
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$PrimaryPhase$1.doRun(TransportShardReplicationOperationAction.java:440)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:745)
2/10 snapshots sharing following 18 elements
org.elasticsearch.index.mapper.object.ObjectMapper.serializeValue(ObjectMapper.java:706)
org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:497)
org.elasticsearch.index.mapper.object.ObjectMapper.serializeObject(ObjectMapper.java:554)
org.elasticsearch.index.mapper.object.ObjectMapper.serializeNonDynamicArray(ObjectMapper.java:685)
org.elasticsearch.index.mapper.object.ObjectMapper.serializeArray(ObjectMapper.java:604)
org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:489)
org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:544)
org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:493)
org.elasticsearch.index.shard.IndexShard.prepareIndex(IndexShard.java:493)


(Camilo Sierra) #2
   org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:409)
   org.elasticsearch.action.bulk.TransportShardBulkAction.shardUpdateOperation(TransportShardBulkAction.java:515)
   org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:232)
   org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$PrimaryPhase.performOnPrimary(TransportShardReplicationOperationAction.java:574)
   org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$PrimaryPhase$1.doRun(TransportShardReplicationOperationAction.java:440)
   org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36)
   java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   java.lang.Thread.run(Thread.java:745)
 5/10 snapshots sharing following 11 elements
   org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:493)
   org.elasticsearch.index.shard.IndexShard.prepareIndex(IndexShard.java:493)
   org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:409)
   org.elasticsearch.action.bulk.TransportShardBulkAction.shardUpdateOperation(TransportShardBulkAction.java:515)
   org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:232)
   org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$PrimaryPhase.performOnPrimary(TransportShardReplicationOperationAction.java:574)
   org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$PrimaryPhase$1.doRun(TransportShardReplicationOperationAction.java:440)
   org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36)
   java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   java.lang.Thread.run(Thread.java:745)

(Jeferson Martins) #3

how many shards in your cluster?
are you executing snapshot in this time?


(Jeferson Martins) #4

and your free disk?


(Camilo Sierra) #5

we have 433 shards in the cluster,
but only 3 shards in the index that makes trouble, this 3 shards are splited in 3 nodes. this is the only index that indexes documents

and dont execute any snapshot in this cluster.

and in disk we have 26% used !


(Jeferson Martins) #6

is shards in same index?

there's replicas for this indexes?

are you change bulk queue and size?


(Camilo Sierra) #7

Yes same index, at the beginig they have replicad but i deleted when the problem started.
but i dont change the size or queue, but for the 18 first days it works and yesterday the traffic was normal but the CPU skyrocket


(Jeferson Martins) #8

Maybe the Elasticsearch host not support the size of your cluster.

How much RAM and CPU you are using?

In my case, I had a cluster with 10 hosts and 7Gbs of RAM each. I had a index with more than 100Gbs of data and my cluster down everyday. I decided use another kind of host in aws, with more RAM and CPU and less indexes in same time in the cluster.

Have you considered put more CPU and RAM?
Are you verified the bulk.queue and bulk.size?

/_cat/thread_pool?v&h=name,host,bulk.active,bulk.rejected,bulk.completed,bulk.queue,bulk.queueSize


(system) #9