I recently upgraded my 6 data nodes from 4 CPUs to 8 CPUs each.
Now I am seeing these errors in Kibana that I've never seen before:
Shard Failures
The following shard failures ocurred:
Index: logstash-2017.02.08 Shard: 0 Reason: {"type":"transport_serialization_exception","reason":"Failed to deserialize response of type [org.elasticsearch.search.fetch.FetchSearchResult]","caused_by":{"type":"illegal_state_exception","reason":"unexpected byte [0x69]"}}
I did. Many posts I've found, point to Kibana. I don't know which is at fault, nor do I entirely care at this point. It is just another issue, for another made up reason, that I continue to have problems.
All my API checks show the ES cluster healthy.
I've spent months running down issue after issue, upgrading, rebuilding, etc.
All I did shut down my hosts, change from 4 CPUs to 8 CPUs and now....I get this stupid a$$ error randomly.
Discover: Unable to parse/serialize body
Less Info
OK
Error: Unable to parse/serialize body
at respond (http://KIBANA/bundles/kibana.bundle.js?v=14695:14:6482)
at checkRespForFailure (http://KIBANA/bundles/kibana.bundle.js?v=14695:14:6156)
at http://KIBANA/bundles/kibana.bundle.js?v=14695:1:24479
at processQueue (http://KIBANA/bundles/commons.bundle.js?v=14695:38:23621)
at http://KIBANA/bundles/commons.bundle.js?v=14695:38:23888
at Scope.$eval (http://KIBANA/bundles/commons.bundle.js?v=14695:39:4619)
at Scope.$digest (http://KIBANA/bundles/commons.bundle.js?v=14695:39:2359)
at Scope.$apply (http://KIBANA/bundles/commons.bundle.js?v=14695:39:5037)
at done (http://KIBANA/bundles/commons.bundle.js?v=14695:37:25027)
at completeRequest (http://KIBANA/bundles/commons.bundle.js?v=14695:37:28702)
sorry to hear that you're encountering so many problems. The two messages you posted both look like Elasticsearch is not responding correctly. You said your health checks indicated that the cluster is healthy. Would you mind posting the responses of calling http[s]://${YOURCLUSTER}/_cat/nodes?v and http[s]://${YOURCLUSTER}/_cat/indices?v?
All those errors indicate that something is wrong with the communication within the Elasticsearch cluster. The log output from the Elasticsearch nodes (e.g. DATANODE-05) does not show anything that could be correlated to this? Where are you hosting your cluster? If it is AWS, there are known issues caused by the networking setup of the AWS Ubuntu images as described in http://logz.io/blog/elasticsearch-cluster-disconnects/.
Last night, I completely rebuilt these 10 servers. I deleted their disks and started from scratch.
I soon as I started the Logstash service on one of my shipping servers, I was immediately hit with these errors. On the ES side, I am getting errors on all 6 data nodes who are receiving the logs.
UPDATE: I had turned of shipping to one of my data center clusters while I finalized my setup with the 6 datanode cluster. In that, the CPU upgrade is the apparent root cause of the issues now. What I did, this morning, was turned off shipping to the 6 datanode cluster and pointed it back at my older, 4 datanode cluster that I had not upgraded the CPUs from 4 to 8 on. So far, as it stands, I have not received a SINGLE error.
[2017-02-08T23:45:05,054][DEBUG][o.e.a.b.TransportShardBulkAction] [KNOX-LOGDB-01] [logstash-2017.02.08][5] failed to execute bulk item (index) index {
[logstash-2017.02.08][wineventlog][AVohMGiU11AUrPoYaON-],
<#### LOG MESSAGE REMOVED ####>
}
org.elasticsearch.index.mapper.MapperParsingException: failed to parse [process_id]
at org.elasticsearch.index.mapper.FieldMapper.parse(FieldMapper.java:298) ~[elasticsearch-5.2.0.jar:5.2.0]
at org.elasticsearch.index.mapper.DocumentParser.parseObjectOrField(DocumentParser.java:438) ~[elasticsearch-5.2.0.jar:5.2.0]
at org.elasticsearch.index.mapper.DocumentParser.parseValue(DocumentParser.java:564) ~[elasticsearch-5.2.0.jar:5.2.0]
at org.elasticsearch.index.mapper.DocumentParser.innerParseObject(DocumentParser.java:384) ~[elasticsearch-5.2.0.jar:5.2.0]
at org.elasticsearch.index.mapper.DocumentParser.parseObjectOrNested(DocumentParser.java:361) ~[elasticsearch-5.2.0.jar:5.2.0]
at org.elasticsearch.index.mapper.DocumentParser.internalParseDocument(DocumentParser.java:93) ~[elasticsearch-5.2.0.jar:5.2.0]
at org.elasticsearch.index.mapper.DocumentParser.parseDocument(DocumentParser.java:66) ~[elasticsearch-5.2.0.jar:5.2.0]
at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:275) ~[elasticsearch-5.2.0.jar:5.2.0]
at org.elasticsearch.index.shard.IndexShard.prepareIndex(IndexShard.java:533) ~[elasticsearch-5.2.0.jar:5.2.0]
at org.elasticsearch.index.shard.IndexShard.prepareIndexOnPrimary(IndexShard.java:510) ~[elasticsearch-5.2.0.jar:5.2.0]
at org.elasticsearch.action.index.TransportIndexAction.prepareIndexOperationOnPrimary(TransportIndexAction.java:196) ~[elasticsearch-5.2.0.jar:5.2.0]
at org.elasticsearch.action.index.TransportIndexAction.executeIndexRequestOnPrimary(TransportIndexAction.java:201) ~[elasticsearch-5.2.0.jar:5.2.0]
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:348) [elasticsearch-5.2.0.jar:5.2.0]
at org.elasticsearch.action.bulk.TransportShardBulkAction.index(TransportShardBulkAction.java:155) [elasticsearch-5.2.0.jar:5.2.0]
at org.elasticsearch.action.bulk.TransportShardBulkAction.handleItem(TransportShardBulkAction.java:134) [elasticsearch-5.2.0.jar:5.2.0]
at org.elasticsearch.action.bulk.TransportShardBulkAction.onPrimaryShard(TransportShardBulkAction.java:120) [elasticsearch-5.2.0.jar:5.2.0]
at org.elasticsearch.action.bulk.TransportShardBulkAction.onPrimaryShard(TransportShardBulkAction.java:73) [elasticsearch-5.2.0.jar:5.2.0]
at org.elasticsearch.action.support.replication.TransportWriteAction.shardOperationOnPrimary(TransportWriteAction.java:76) [elasticsearch-5.2.0.jar:5.2.0]
at org.elasticsearch.action.support.replication.TransportWriteAction.shardOperationOnPrimary(TransportWriteAction.java:49) [elasticsearch-5.2.0.jar:5.2.0]
at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryShardReference.perform(TransportReplicationAction.java:914) [elasticsearch-5.2.0.jar:5.2.0]
at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryShardReference.perform(TransportReplicationAction.java:884) [elasticsearch-5.2.0.jar:5.2.0]
at org.elasticsearch.action.support.replication.ReplicationOperation.execute(ReplicationOperation.java:113) [elasticsearch-5.2.0.jar:5.2.0]
at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.onResponse(TransportReplicationAction.java:327) [elasticsearch-5.2.0.jar:5.2.0]
at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.onResponse(TransportReplicationAction.java:262) [elasticsearch-5.2.0.jar:5.2.0]
at org.elasticsearch.action.support.replication.TransportReplicationAction$1.onResponse(TransportReplicationAction.java:864) [elasticsearch-5.2.0.jar:5.2.0]
at org.elasticsearch.action.support.replication.TransportReplicationAction$1.onResponse(TransportReplicationAction.java:861) [elasticsearch-5.2.0.jar:5.2.0]
at org.elasticsearch.index.shard.IndexShardOperationsLock.acquire(IndexShardOperationsLock.java:142) [elasticsearch-5.2.0.jar:5.2.0]
at org.elasticsearch.index.shard.IndexShard.acquirePrimaryOperationLock(IndexShard.java:1652) [elasticsearch-5.2.0.jar:5.2.0]
at org.elasticsearch.action.support.replication.TransportReplicationAction.acquirePrimaryShardReference(TransportReplicationAction.java:873) [elasticsearch-5.2.0.jar:5.2.0]
at org.elasticsearch.action.support.replication.TransportReplicationAction.access$400(TransportReplicationAction.java:92) [elasticsearch-5.2.0.jar:5.2.0]
at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.doRun(TransportReplicationAction.java:279) [elasticsearch-5.2.0.jar:5.2.0]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-5.2.0.jar:5.2.0]
at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:258) [elasticsearch-5.2.0.jar:5.2.0]
at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:250) [elasticsearch-5.2.0.jar:5.2.0]
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:69) [elasticsearch-5.2.0.jar:5.2.0]
at org.elasticsearch.transport.TransportService$7.doRun(TransportService.java:610) [elasticsearch-5.2.0.jar:5.2.0]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:596) [elasticsearch-5.2.0.jar:5.2.0]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-5.2.0.jar:5.2.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_121]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_121]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_121]
Caused by: java.lang.NumberFormatException: For input string: "0x0000000000000c08"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) ~[?:1.8.0_121]
at java.lang.Long.parseLong(Long.java:589) ~[?:1.8.0_121]
at java.lang.Long.parseLong(Long.java:631) ~[?:1.8.0_121]
... 40 more
So what this exception indicates is that the field process_id is set to a long type in the mapping, but the document being indexed contains the string "0x0000000000000c08" in that field. So either the mapping seems to be incorrect or Logstash seems to ship documents with incorrect field types. I am not sure how that could result in the cluster returning such errors as shown before, but maybe its a result of getting swamped with such indexing errors. Is there any error regarding the cluster memberships in the logs?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.