Infinite loop in Elastic Search 0.19.9


(Wojciech Durczyński) #1

Hello.
Recently our Elastic Search nodes started to use 100% CPU. Only way to
solve this problem is to restart broken nodes.
Stack traces usually contain following code:

"elasticsearch[Dark Phoenix][bulk][T#1]" daemon prio=10
tid=0x00007f0748083800 nid=0x5e3c runnable [0x00007f07d740f000]
java.lang.Thread.State: RUNNABLE
at
org.elasticsearch.common.collect.RegularImmutableMap.get(RegularImmutableMap.java:164)

    at 

org.elasticsearch.index.mapper.object.ObjectMapper.serializeValue(ObjectMapper.java:583)

    at 

org.elasticsearch.index.mapper.object.ObjectMapper.serializeArray(ObjectMapper.java:573)

    at 

org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:441)

    at 

org.elasticsearch.index.mapper.object.ObjectMapper.serializeObject(ObjectMapper.java:497)

    at 

org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:439)

    at 

org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:494)

    at 

org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:438)

    at 

org.elasticsearch.index.shard.service.InternalIndexShard.prepareIndex(InternalIndexShard.java:309)

    at 

org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:157)

    at 

org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:532)

    at 

org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:430)

    at 

java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

    at 

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

    at java.lang.Thread.run(Thread.java:662) 

What's the problem?

--


(Martijn Van Groningen) #2

Hi Wojciech,

How often does this happen? Does it usually happen during in a bulk
import? If so with what options do you start a bulk import?
Would be great if you somehow were able to reproduce this issue.

Martijn

On 16 October 2012 09:21, Wojciech Durczyński
wojciech.durczynski@comarch.com wrote:

Hello.
Recently our Elastic Search nodes started to use 100% CPU. Only way to solve
this problem is to restart broken nodes.
Stack traces usually contain following code:

"elasticsearch[Dark Phoenix][bulk][T#1]" daemon prio=10
tid=0x00007f0748083800 nid=0x5e3c runnable [0x00007f07d740f000]
java.lang.Thread.State: RUNNABLE
at
org.elasticsearch.common.collect.RegularImmutableMap.get(RegularImmutableMap.java:164)
at
org.elasticsearch.index.mapper.object.ObjectMapper.serializeValue(ObjectMapper.java:583)
at
org.elasticsearch.index.mapper.object.ObjectMapper.serializeArray(ObjectMapper.java:573)
at
org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:441)
at
org.elasticsearch.index.mapper.object.ObjectMapper.serializeObject(ObjectMapper.java:497)
at
org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:439)
at
org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:494)
at
org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:438)
at
org.elasticsearch.index.shard.service.InternalIndexShard.prepareIndex(InternalIndexShard.java:309)
at
org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:157)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:532)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:430)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

What's the problem?

--

--
Met vriendelijke groet,

Martijn van Groningen

--


(Martijn Van Groningen) #3

In version 0.19.10 improvements have been mode in the ObjectMapper
class, to prevent endless looping. Can you check it out if the 100%
cpu usage still occurs with version 0.19.10? Instead of endless
looping an error should occur instead.

If the 100% CPU usage still occurs with the latest version, it is also
helpful if you provide several full hot threads exports
(http://localhost:9200/_nodes/hot_threads). This helps to pin point
the issue better.

Martijn

On 16 October 2012 13:38, Martijn v Groningen
martijn.v.groningen@gmail.com wrote:

Hi Wojciech,

How often does this happen? Does it usually happen during in a bulk
import? If so with what options do you start a bulk import?
Would be great if you somehow were able to reproduce this issue.

Martijn

On 16 October 2012 09:21, Wojciech Durczyński
wojciech.durczynski@comarch.com wrote:

Hello.
Recently our Elastic Search nodes started to use 100% CPU. Only way to solve
this problem is to restart broken nodes.
Stack traces usually contain following code:

"elasticsearch[Dark Phoenix][bulk][T#1]" daemon prio=10
tid=0x00007f0748083800 nid=0x5e3c runnable [0x00007f07d740f000]
java.lang.Thread.State: RUNNABLE
at
org.elasticsearch.common.collect.RegularImmutableMap.get(RegularImmutableMap.java:164)
at
org.elasticsearch.index.mapper.object.ObjectMapper.serializeValue(ObjectMapper.java:583)
at
org.elasticsearch.index.mapper.object.ObjectMapper.serializeArray(ObjectMapper.java:573)
at
org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:441)
at
org.elasticsearch.index.mapper.object.ObjectMapper.serializeObject(ObjectMapper.java:497)
at
org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:439)
at
org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:494)
at
org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:438)
at
org.elasticsearch.index.shard.service.InternalIndexShard.prepareIndex(InternalIndexShard.java:309)
at
org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:157)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:532)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:430)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

What's the problem?

--

--
Met vriendelijke groet,

Martijn van Groningen

--
Met vriendelijke groet,

Martijn van Groningen

--


(Wojciech Durczyński) #4

I checked version 0.19.10 and infinite loop is replaced by an error in this
version.
Thank you - this allows me to solve problems with my mapping.

W dniu wtorek, 16 października 2012 16:50:28 UTC+2 użytkownik Martijn v
Groningen napisał:

In version 0.19.10 improvements have been mode in the ObjectMapper
class, to prevent endless looping. Can you check it out if the 100%
cpu usage still occurs with version 0.19.10? Instead of endless
looping an error should occur instead.

If the 100% CPU usage still occurs with the latest version, it is also
helpful if you provide several full hot threads exports
(http://localhost:9200/_nodes/hot_threads). This helps to pin point
the issue better.

Martijn

On 16 October 2012 13:38, Martijn v Groningen
<martijn.v...@gmail.com <javascript:>> wrote:

Hi Wojciech,

How often does this happen? Does it usually happen during in a bulk
import? If so with what options do you start a bulk import?
Would be great if you somehow were able to reproduce this issue.

Martijn

On 16 October 2012 09:21, Wojciech Durczyński
<wojciech....@comarch.com <javascript:>> wrote:

Hello.
Recently our Elastic Search nodes started to use 100% CPU. Only way to
solve

this problem is to restart broken nodes.
Stack traces usually contain following code:

"elasticsearch[Dark Phoenix][bulk][T#1]" daemon prio=10
tid=0x00007f0748083800 nid=0x5e3c runnable [0x00007f07d740f000]
java.lang.Thread.State: RUNNABLE
at

org.elasticsearch.common.collect.RegularImmutableMap.get(RegularImmutableMap.java:164)

    at 

org.elasticsearch.index.mapper.object.ObjectMapper.serializeValue(ObjectMapper.java:583)

    at 

org.elasticsearch.index.mapper.object.ObjectMapper.serializeArray(ObjectMapper.java:573)

    at 

org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:441)

    at 

org.elasticsearch.index.mapper.object.ObjectMapper.serializeObject(ObjectMapper.java:497)

    at 

org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:439)

    at 

org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:494)

    at 

org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:438)

    at 

org.elasticsearch.index.shard.service.InternalIndexShard.prepareIndex(InternalIndexShard.java:309)

    at 

org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:157)

    at 

org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:532)

    at 

org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:430)

    at 

java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

    at 

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

    at java.lang.Thread.run(Thread.java:662) 

What's the problem?

--

--
Met vriendelijke groet,

Martijn van Groningen

--
Met vriendelijke groet,

Martijn van Groningen

--


(system) #5