Elasticsearch cluster running 7.3.1. I was adding/removing some nodes to the cluster, but no other config changes took place. The _cat/nodes
API started returning an error from all nodes. Nothing in any of the log files across the cluster:
$ curl localhost:9200/_cat/nodes
{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"Values less than -1 bytes are not supported: -16b"}],"type":"illegal_argument_exception","reason":"Values less than -1 bytes are not supported: -16b"},"status":400}
The _cat/indices
API would just hang forever and not return results. Other _cat
APIs and _nodes
seemed to return fine.
I tracked it down to a single data node and this error:
$ curl http://localhost:9200/_nodes/_local/stats
{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"Values less than -1 bytes are not supported: -16b"}],"type":"illegal_argument_exception","reason":"Values less than -1 bytes are not supported: -16b","suppressed":[{"type":"illegal_state_exception","reason":"Failed to close the XContentBuilder","caused_by":{"type":"i_o_exception","reason":"Unclosed object or array found"}}]},"status":400}
After restarting that data node, the _cat
APIs returned to normal.
- Is there a way in the future I can better debug this type of error?
- Is it expected that a problem on one node would break the
_cat
APIs for every node across the cluster?