Is this stacktrace a reason for cluster instability?

Hi List,

I'm running ES 1.0.1 in a 6 node configuration.
Some days ago we have experienced instability issues with our cluster.
At a certain moment no documents could be indexed anymore. After
restarting the indexing processes (Logstash), indexing worked again for a
short period of time only to stall again after a brief period.
The ES cluster had to be restarted to restore normal behavior. After that,
some (recent) index shards stayed unassigned.
I dropped the replication value for those indexes to 0 in order to clear
the unassigned shards. Enabling replication resulted again into
unassigned shards for the impacted indexes.
I have left the replication level to 0 for the troublesome indexes in order
to clear the cluster status.
Meanwhile, indexing and replication works again for any newly created
indexes.

What we noticed:

  • The logs didn't reveal any immediate cause. The only reported issue
    we saw in the logs prior to the incident is the below mentioned stack trace.
    Afterwards, we have countered this issue by creating a template which
    enforces the offending field to be treated as a string, which should
    prevent the below mentioned error.

  • Our ES collected metrics revealed a sudden drop of total number of
    Java threads at the exact same moment.

My question: "Could the below mentioned stracktrace be the cause of any
cluster instability?"

Thanks,

Jelle

[2014-03-15 03:15:38,414][DEBUG][action.bulk ]
[xxxx-xxxxxxx-logs-001] [logstash-2014.03.15][0] failed to execute bulk
item (index) index {[logstash-2014.03.15][logs][MBzzcesjQTSjUBhryTAgzQ],
source[{"@source":"tcp://:0:0:0:0:0:0:1:60186/","@tags":[],"@fields":{"timestamp":["Mar
15
04:15:37"],"logsource":["xxxxxxx-sss01"],"program":["snmptrapd"],"snmptrapsource":["xxxxxxx-hdp04"],"snmptrapseverity":["INFORMATIONAL"],"message":["CLI/Telnet
user logout: iRMC S2 CLI/Telnet user '' logout from
xxx.xxx.xxx.xxx"]},"@timestamp":"2014-03-15T03:15:37.931+00:00","@message":"<13>Mar
15 04:15:37 xxxxxxx-sss01 snmptrapd: xxxxxxx-hdp04 INFORMATIONAL CLI/Telnet
user logout: iRMC S2 CLI/Telnet user
'' logout from
xxx.xxx.xxx.xxx\n","@type":"snmptrapd","@collector":["xxxxxxx-sss01.xxx.xxxxxx"],"@version":"1"}]}
org.elasticsearch.index.mapper.MapperParsingException: failed to parse
[@fields.snmptrapsource]
at
org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:418)
at
org.elasticsearch.index.mapper.object.ObjectMapper.serializeValue(ObjectMapper.java:616)
at
org.elasticsearch.index.mapper.object.ObjectMapper.serializeArray(ObjectMapper.java:604)
at
org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:461)
at
org.elasticsearch.index.mapper.object.ObjectMapper.serializeObject(ObjectMapper.java:517)
at
org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:459)
at
org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:515)
at
org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:462)
at
org.elasticsearch.index.shard.service.InternalIndexShard.prepareCreate(InternalIndexShard.java:371)
at
org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:400)
at
org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:153)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:556)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:426)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: org.elasticsearch.index.mapper.MapperParsingException: failed to
parse date field [xxxxxxx-hdp04], tried both date format
[dateOptionalTime], and timestamp number with locale []
at
org.elasticsearch.index.mapper.core.DateFieldMapper.parseStringValue(DateFieldMapper.java:582)
at
org.elasticsearch.index.mapper.core.DateFieldMapper.innerParseCreateField(DateFieldMapper.java:510)
at
org.elasticsearch.index.mapper.core.NumberFieldMapper.parseCreateField(NumberFieldMapper.java:215)
at
org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:408)
... 15 more
Caused by: java.lang.IllegalArgumentException: Invalid format:
"xxxxxxx-hdp04"
at
org.elasticsearch.common.joda.time.format.DateTimeFormatter.parseMillis(DateTimeFormatter.java:754)
at
org.elasticsearch.index.mapper.core.DateFieldMapper.parseStringValue(DateFieldMapper.java:576)
... 18 more

[2014-03-15 03:15:57,635][DEBUG][action.bulk ]
[xxxx-xxxxxxx-logs-001] [logstash-2014.03.15][0] failed to execute bulk
item (index) index {[logstash-2014.03.15][logs][jnTLutANRxeQKP4z_SF5Jw],
source[{"@source":"tcp://:0:0:0:0:0:0:1:51511/","@tags":[],"@fields":{"timestamp":["Mar
15
04:15:57"],"logsource":["xxxxxxx-xxxxx"],"program":["snmptrapd"],"snmptrapsource":["xxxxxxx-hdp13"],"snmptrapseverity":["INFORMATIONAL"],"message":["CLI/Telnet
user logout: iRMC S2 CLI/Telnet user '' logout from
xxx.xxx.xxx.xxx"]},"@timestamp":"2014-03-15T03:15:57.514+00:00","@message":"<13>Mar
15 04:15:57 xxxxxxx-xxxxx snmptrapd: xxxxxxx-hdp13 INFORMATIONAL CLI/Telnet
user logout: iRMC S2 CLI/Telnet user '' logout from
xxx.xxx.xxx.xxx\n","@type":"snmptrapd","@collector":["xxxxxxx-xxxxx.xxx.xxxxxx"],"@version":"1"}]}
org.elasticsearch.index.mapper.MapperParsingException: failed to parse
[@fields.snmptrapsource]
at
org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:418)
at
org.elasticsearch.index.mapper.object.ObjectMapper.serializeValue(ObjectMapper.java:616)
at
org.elasticsearch.index.mapper.object.ObjectMapper.serializeArray(ObjectMapper.java:604)
at
org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:461)
at
org.elasticsearch.index.mapper.object.ObjectMapper.serializeObject(ObjectMapper.java:517)
at
org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:459)
at
org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:515)
at
org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:462)
at
org.elasticsearch.index.shard.service.InternalIndexShard.prepareCreate(InternalIndexShard.java:371)
at
org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:400)
at
org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:153)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:556)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:426)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: org.elasticsearch.index.mapper.MapperParsingException: failed to
parse date field [xxxxxxx-hdp13], tried both date format
[dateOptionalTime], and timestamp number with locale []
at
org.elasticsearch.index.mapper.core.DateFieldMapper.parseStringValue(DateFieldMapper.java:582)
at
org.elasticsearch.index.mapper.core.DateFieldMapper.innerParseCreateField(DateFieldMapper.java:510)
at
org.elasticsearch.index.mapper.core.NumberFieldMapper.parseCreateField(NumberFieldMapper.java:215)
at
org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:408)
... 15 more
Caused by: java.lang.IllegalArgumentException: Invalid format:
"xxxxxxx-hdp13"
at
org.elasticsearch.common.joda.time.format.DateTimeFormatter.parseMillis(DateTimeFormatter.java:754)
at
org.elasticsearch.index.mapper.core.DateFieldMapper.parseStringValue(DateFieldMapper.java:576)
... 18 more

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/77769ff2-7e46-4339-ae2b-22409e76f415%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

For me this exceptino is just saying that the ES couldnt convert ":
"xxxxxxx-hdp13" to Date object

It sounds like just incorrect query .

So i dont think this is the reason for your troubles.

Nore that the message is with severity DEBUG , and not Error or CRIT

On Wednesday, March 19, 2014 11:17:23 AM UTC+1, Jelle Smet wrote:

Hi List,

I'm running ES 1.0.1 in a 6 node configuration.
Some days ago we have experienced instability issues with our cluster.
At a certain moment no documents could be indexed anymore. After
restarting the indexing processes (Logstash), indexing worked again for a
short period of time only to stall again after a brief period.
The ES cluster had to be restarted to restore normal behavior. After
that, some (recent) index shards stayed unassigned.
I dropped the replication value for those indexes to 0 in order to clear
the unassigned shards. Enabling replication resulted again into
unassigned shards for the impacted indexes.
I have left the replication level to 0 for the troublesome indexes in
order to clear the cluster status.
Meanwhile, indexing and replication works again for any newly created
indexes.

What we noticed:

  • The logs didn't reveal any immediate cause. The only reported issue
    we saw in the logs prior to the incident is the below mentioned stack trace.
    Afterwards, we have countered this issue by creating a template which
    enforces the offending field to be treated as a string, which should
    prevent the below mentioned error.

  • Our ES collected metrics revealed a sudden drop of total number of
    Java threads at the exact same moment.

My question: "Could the below mentioned stracktrace be the cause of any
cluster instability?"

Thanks,

Jelle

[2014-03-15 03:15:38,414][DEBUG][action.bulk ]
[xxxx-xxxxxxx-logs-001] [logstash-2014.03.15][0] failed to execute bulk
item (index) index {[logstash-2014.03.15][logs][MBzzcesjQTSjUBhryTAgzQ],
source[{"@source":"tcp://:0:0:0:0:0:0:1:60186/","@tags":,"@fields":{"timestamp":["Mar
15
04:15:37"],"logsource":["xxxxxxx-sss01"],"program":["snmptrapd"],"snmptrapsource":["xxxxxxx-hdp04"],"snmptrapseverity":["INFORMATIONAL"],"message":["CLI/Telnet
user logout: iRMC S2 CLI/Telnet user '' logout from
xxx.xxx.xxx.xxx"]},"@timestamp":"2014-03-15T03:15:37.931+00:00","@message":"<13>Mar
15 04:15:37 xxxxxxx-sss01 snmptrapd: xxxxxxx-hdp04 INFORMATIONAL CLI/Telnet
user logout: iRMC S2 CLI/Telnet user
'' logout from
xxx.xxx.xxx.xxx\n","@type":"snmptrapd","@collector":["xxxxxxx-sss01.xxx.xxxxxx"],"@version":"1"}]}
org.elasticsearch.index.mapper.MapperParsingException: failed to parse
[@fields.snmptrapsource]
at
org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:418)
at
org.elasticsearch.index.mapper.object.ObjectMapper.serializeValue(ObjectMapper.java:616)
at
org.elasticsearch.index.mapper.object.ObjectMapper.serializeArray(ObjectMapper.java:604)
at
org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:461)
at
org.elasticsearch.index.mapper.object.ObjectMapper.serializeObject(ObjectMapper.java:517)
at
org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:459)
at
org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:515)
at
org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:462)
at
org.elasticsearch.index.shard.service.InternalIndexShard.prepareCreate(InternalIndexShard.java:371)
at
org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:400)
at
org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:153)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:556)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:426)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: org.elasticsearch.index.mapper.MapperParsingException: failed
to parse date field [xxxxxxx-hdp04], tried both date format
[dateOptionalTime], and timestamp number with locale
at
org.elasticsearch.index.mapper.core.DateFieldMapper.parseStringValue(DateFieldMapper.java:582)
at
org.elasticsearch.index.mapper.core.DateFieldMapper.innerParseCreateField(DateFieldMapper.java:510)
at
org.elasticsearch.index.mapper.core.NumberFieldMapper.parseCreateField(NumberFieldMapper.java:215)
at
org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:408)
... 15 more
Caused by: java.lang.IllegalArgumentException: Invalid format:
"xxxxxxx-hdp04"
at
org.elasticsearch.common.joda.time.format.DateTimeFormatter.parseMillis(DateTimeFormatter.java:754)
at
org.elasticsearch.index.mapper.core.DateFieldMapper.parseStringValue(DateFieldMapper.java:576)
... 18 more

[2014-03-15 03:15:57,635][DEBUG][action.bulk ]
[xxxx-xxxxxxx-logs-001] [logstash-2014.03.15][0] failed to execute bulk
item (index) index {[logstash-2014.03.15][logs][jnTLutANRxeQKP4z_SF5Jw],
source[{"@source":"tcp://:0:0:0:0:0:0:1:51511/","@tags":,"@fields":{"timestamp":["Mar
15
04:15:57"],"logsource":["xxxxxxx-xxxxx"],"program":["snmptrapd"],"snmptrapsource":["xxxxxxx-hdp13"],"snmptrapseverity":["INFORMATIONAL"],"message":["CLI/Telnet
user logout: iRMC S2 CLI/Telnet user '' logout from
xxx.xxx.xxx.xxx"]},"@timestamp":"2014-03-15T03:15:57.514+00:00","@message":"<13>Mar
15 04:15:57 xxxxxxx-xxxxx snmptrapd: xxxxxxx-hdp13 INFORMATIONAL CLI/Telnet
user logout: iRMC S2 CLI/Telnet user '' logout from
xxx.xxx.xxx.xxx\n","@type":"snmptrapd","@collector":["xxxxxxx-xxxxx.xxx.xxxxxx"],"@version":"1"}]}
org.elasticsearch.index.mapper.MapperParsingException: failed to parse
[@fields.snmptrapsource]
at
org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:418)
at
org.elasticsearch.index.mapper.object.ObjectMapper.serializeValue(ObjectMapper.java:616)
at
org.elasticsearch.index.mapper.object.ObjectMapper.serializeArray(ObjectMapper.java:604)
at
org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:461)
at
org.elasticsearch.index.mapper.object.ObjectMapper.serializeObject(ObjectMapper.java:517)
at
org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:459)
at
org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:515)
at
org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:462)
at
org.elasticsearch.index.shard.service.InternalIndexShard.prepareCreate(InternalIndexShard.java:371)
at
org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:400)
at
org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:153)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:556)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:426)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: org.elasticsearch.index.mapper.MapperParsingException: failed
to parse date field [xxxxxxx-hdp13], tried both date format
[dateOptionalTime], and timestamp number with locale
at
org.elasticsearch.index.mapper.core.DateFieldMapper.parseStringValue(DateFieldMapper.java:582)
at
org.elasticsearch.index.mapper.core.DateFieldMapper.innerParseCreateField(DateFieldMapper.java:510)
at
org.elasticsearch.index.mapper.core.NumberFieldMapper.parseCreateField(NumberFieldMapper.java:215)
at
org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:408)
... 15 more
Caused by: java.lang.IllegalArgumentException: Invalid format:
"xxxxxxx-hdp13"
at
org.elasticsearch.common.joda.time.format.DateTimeFormatter.parseMillis(DateTimeFormatter.java:754)
at
org.elasticsearch.index.mapper.core.DateFieldMapper.parseStringValue(DateFieldMapper.java:576)
... 18 more

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/df80ffe8-fbcd-416b-86a2-433e9ec94f6f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

So i dont think this is the reason for your troubles.
Nore that the message is with severity DEBUG , and not Error or CRIT

True, this was indeed not the cause of the problems we ran into.

Cheers,

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9673b328-2af3-419f-abb6-1ac5b8559562%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.