Hi,
We use filebeat here and ingest nodes for logs aggregation on a 5.2.1 3 nodes cluster deployed on RH 7.3 servers. We recently experienced the following:
$ curl node0:9200?pretty
{
"error" : {
"root_cause" : [
{
"type" : "circuit_breaking_exception",
"reason" : "[parent] Data too large, data for [<http_request>] would be larger than limit of [1491035750/1.3gb]",
"bytes_wanted" : 1491058680,
"bytes_limit" : 1491035750
}
],
"type" : "circuit_breaking_exception",
"reason" : "[parent] Data too large, data for [<http_request>] would be larger than limit of [1491035750/1.3gb]",
"bytes_wanted" : 1491058680,
"bytes_limit" : 1491035750
},
"status" : 503
}
All nodes were sending this reply when queried with a GET on '/'.
We are still investigating what sent such a large request. But the real problem was the weird cluster state afterwards. We still had the three nodes showing up in cluster health (and green):
$ curl -s http://node0:9200/_cluster/health?pretty
{
"cluster_name" : "log_preprod",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 3,
"number_of_data_nodes" : 3,
"active_primary_shards" : 6,
"active_shards" : 12,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 100.0
}
However node stats were showing only one node:
$ curl -s http://node0:9200/_nodes/stats?pretty | jq '.nodes |keys'
[
"9aVJKZEFQcKL2tWFbUDYsg"
]
We had to restart all nodes in the cluster to get it back to work.
Any clue on what happened here ?
Thanks
M