Error from _cat APIs stemming from one bad data node

Elasticsearch cluster running 7.3.1. I was adding/removing some nodes to the cluster, but no other config changes took place. The _cat/nodes API started returning an error from all nodes. Nothing in any of the log files across the cluster:

$ curl localhost:9200/_cat/nodes
{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"Values less than -1 bytes are not supported: -16b"}],"type":"illegal_argument_exception","reason":"Values less than -1 bytes are not supported: -16b"},"status":400}

The _cat/indices API would just hang forever and not return results. Other _cat APIs and _nodes seemed to return fine.

I tracked it down to a single data node and this error:

$ curl http://localhost:9200/_nodes/_local/stats
{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"Values less than -1 bytes are not supported: -16b"}],"type":"illegal_argument_exception","reason":"Values less than -1 bytes are not supported: -16b","suppressed":[{"type":"illegal_state_exception","reason":"Failed to close the XContentBuilder","caused_by":{"type":"i_o_exception","reason":"Unclosed object or array found"}}]},"status":400}

After restarting that data node, the _cat APIs returned to normal.

  1. Is there a way in the future I can better debug this type of error?
  2. Is it expected that a problem on one node would break the _cat APIs for every node across the cluster?

if that occurs again, can you include the stack trace please. Just run curl http://localhost:9200/_nodes/_local/stats?error_trace=true and share the output. Thanks!

Oh, I didn't know about that option, thanks!

I ran into this, too, a few weeks ago when ~20TB worth of shards were rebalancing after replacing some nodes on a 7.2 cluster. For me, it was a transient condition that fixed itself within the hour. I didn't know about that stack trace trick, either, but will keep it in mind if I run into this again.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.