Can Index Parsing Errors Impact node/cluster stability?


(Russell Day) #1

Please refer to:


(Daniel Mitterdorfer) #2

Hi,

hard to tell. I'd not expect increased heap usage due to that. However, if you're provoking a lot of exceptions the JIT compiler will optimize (that part of the code) differently which will likely lead to worse performance.

I don't know anything about your application but it is possible to get the current mapping of an index via the get mapping API.

Daniel


(Russell Day) #3

Thanks Daniel,

It seems unlikely to me as well however I just wanted to check. A common issue we are seeing is messages like this: timed out waiting for all nodes to process published state.

What happens when this is the case? Does the master try again or does it potentially remove nodes that did not ack the latest version from the cluster?


(Daniel Mitterdorfer) #4

Hi,

if publishing fails you will see a warning in the log but there is no retry on that level. However, the node(s) that failed to acknowledge publication within the timeout will receive the full cluster state when the next cluster state update is published.

Daniel


(Russell Day) #5

So the node(s) that did not ack the cluster state update within the timeout are still in service and can serve requests correct?


(Russell Day) #6

Also, I would like to clarify that our ingest service is the throwing the HighLevelRestClient errors. The data node logs show:
org.elasticsearch.index.mapper.MapperParsingException: failed to parse...

Can you clarify if a high number of these messages can lead to node health issues?


(Daniel Mitterdorfer) #7

Hi,

Yes, usually the nodes are still in service (but it could always happen that a node just died within this time period).

The fact that a cluster state update takes more than half a minute indicates that these nodes are stressed though (many cluster state updates due to very frequent changes in the mapping?). I think it would make sense that you dig deeper why that is the case in your cluster.

Some pointers:

  • What is happening on hardware level (disk, memory, CPU, network and other resources)?
  • What are the affected nodes doing at that point? (hot_threads API, attach a profiler (if on a test system), take thread stack traces, etc.)

Daniel


(system) #8

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.