Received an alert saying that one of my nodes was down, when I tried to curl / it just hung.
Checked on the health of the cluster and the node and noted something very strange:
"network": {
"tcp": {
"active_opens": 102188,
"passive_opens": 7133683,
"curr_estab": 205,
"in_segs": 1483621255,
"out_segs": 2405602124,
"retrans_segs": 569006,
"estab_resets": 9251,
"attempt_fails": 3252,
"in_errs": 11,
"out_rsts": 23640
}
},
# sudo netstat -tupn |grep CLOSE_WAIT | wc -l
11711
# sudo netstat -tupn | wc -l
11940
shows a ton of CLOSE_WAIT
The active opens were about 10x more on this node than any other one. What are active opens vs passive opens, and what is the expected number of active/passive opens, and how can I make elasticsearch close these connections more aggressively?
I'm running 1.7.1
on Java 1.8
{
"status" : 200,
"name" : <redacted>,
"cluster_name" : <redacted>,
"version" : {
"number" : "1.7.1",
"build_hash" : "b88f43fc40b0bcd7f173a1f9ee2e97816de80b19",
"build_timestamp" : "2015-07-29T09:54:16Z",
"build_snapshot" : false,
"lucene_version" : "4.10.4"
},
"tagline" : "You Know, for Search"
}
seems to be related, but it doesn't to have a conclusive solution or understanding of what is happening