High Cpu usage in elasticsearch nodes

Hi ,
I am using ELK 6.4.0. ES cluster nodes are using high cpu usage with out much load. I observed garbage collection pauses in logs.
ES configurations:
1. 5 node cluster
2. 8 core and 32gb
3. Shards 5 per index

I tried with below options:
  1. Increased heap memory from 8 gb to 12gb (My machine configuration is 32 gb)
  2. Edited yml configuration -XX:NewRatio=4 to balance young and old heap size. 

But even though cpu is hitting to more than 90% for one of the node . And later one by one all nodes are reaching near to 100%

--> hot threads

      98.0% (490.1ms out of 500ms) cpu usage by thread 'elasticsearch[esnode_42_01][write][T#7]'
        10/10 snapshots sharing following 52 elements
       java.math.BigInteger.square(BigInteger.java:1899)
       java.math.BigInteger.squareToomCook3(BigInteger.java:2054)
       java.math.BigInteger.square(BigInteger.java:1899)
       java.math.BigInteger.squareToomCook3(BigInteger.java:2054)
       java.math.BigInteger.square(BigInteger.java:1899)
       java.math.BigInteger.squareToomCook3(BigInteger.java:2049)
       java.math.BigInteger.square(BigInteger.java:1899)
       java.math.BigInteger.squareToomCook3(BigInteger.java:2051)
       java.math.BigInteger.square(BigInteger.java:1899)
       java.math.BigInteger.pow(BigInteger.java:2306)
       java.math.BigDecimal.bigTenToThe(BigDecimal.java:3543)
       java.math.BigDecimal.bigMultiplyPowerTen(BigDecimal.java:3676)
       java.math.BigDecimal.setScale(BigDecimal.java:2445)
       java.math.BigDecimal.toBigInteger(BigDecimal.java:3025)
       org.elasticsearch.common.xcontent.support.AbstractXContentParser.toLong(AbstractXContentParser.java:195)
       org.elasticsearch.common.xcontent.support.AbstractXContentParser.longValue(AbstractXContentParser.java:220)
       org.elasticsearch.index.mapper.NumberFieldMapper$NumberType$7.parse(NumberFieldMapper.java:679)
       org.elasticsearch.index.mapper.NumberFieldMapper$NumberType$7.parse(NumberFieldMapper.java:655)
       org.elasticsearch.index.mapper.NumberFieldMapper.parseCreateField(NumberFieldMapper.java:1010)
       org.elasticsearch.index.mapper.FieldMapper.parse(FieldMapper.java:297)
       org.elasticsearch.index.mapper.DocumentParser.parseObjectOrField(DocumentParser.java:481)
       org.elasticsearch.index.mapper.DocumentParser.parseValue(DocumentParser.java:608)
       org.elasticsearch.index.mapper.DocumentParser.innerParseObject(DocumentParser.java:403)
       org.elasticsearch.index.mapper.DocumentParser.parseObjectOrNested(DocumentParser.java:380)
       org.elasticsearch.index.mapper.DocumentParser.internalParseDocument(DocumentParser.java:95)
       org.elasticsearch.index.mapper.DocumentParser.parseDocument(DocumentParser.java:69)
       org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:263)
       org.elasticsearch.index.shard.IndexShard.prepareIndex(IndexShard.java:725)
       org.elasticsearch.index.shard.IndexShard.applyIndexOperation(IndexShard.java:702)
       org.elasticsearch.index.shard.IndexShard.applyIndexOperationOnReplica(IndexShard.java:689)
       org.elasticsearch.action.bulk.TransportShardBulkAction.performOpOnReplica(TransportShardBulkAction.java:524)
       org.elasticsearch.action.bulk.TransportShardBulkAction.performOnReplica(TransportShardBulkAction.java:492)
       org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnReplica(TransportShardBulkAction.java:479)
       org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnReplica(TransportShardBulkAction.java:73)
       org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncReplicaAction.onResponse(TransportReplicationAction.java:564)
       org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncReplicaAction.onResponse(TransportReplicationAction.java:527)
       org.elasticsearch.index.shard.IndexShard$3.onResponse(IndexShard.java:2357)
       org.elasticsearch.index.shard.IndexShard$3.onResponse(IndexShard.java:2337)
       org.elasticsearch.index.shard.IndexShardOperationPermits.acquire(IndexShardOperationPermits.java:271)
       org.elasticsearch.index.shard.IndexShardOperationPermits.acquire(IndexShardOperationPermits.java:238)
       org.elasticsearch.index.shard.IndexShard.acquireReplicaOperationPermit(IndexShard.java:2336)
       org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncReplicaAction.doRun(TransportReplicationAction.java:633)
       org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
       org.elasticsearch.action.support.replication.TransportReplicationAction$ReplicaOperationTransportHandler.messageReceived(TransportReplicationAction.java:510)
       org.elasticsearch.action.support.replication.TransportReplicationAction$ReplicaOperationTransportHandler.messageReceived(TransportReplicationAction.java:490)
       org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66)

Can someone suggest how to trouble shoot this?

Thanks in advance

It looks like your CPUs are spending a lot of time trying to convert numbers into longs from some format that doesn't look much like a long. The fast path expects a sequence of decimal digits that fits in a long (with an optional leading sign) but if that fails we try the slow path using BigDecimal. Either fix your data to be easier to parse, or fix your mapping not to attempt this conversion.

Thanks for your quick response. Do you have any idea how to know which index mapping fields causing this issue as because there are many indexes ?

I have no simple ideas for doing so, but perhaps there are log messages indicating indexing failures? If you are hitting java.math.BigInteger.squareToomCook3 then I think the numbers must be quite big.

Thanks, Issue is resolved. Couple of the derived fields are coming as number and sometimes as String. But as first value received as numeric value, metadata of field is assigned with Number type by default. Corrected to String now. And CPU usage is very low now , less than 5% which is amazing. Once again thanks for your inputs.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.