Elasticsearch node unresponsive, high active_opens, CLOSE_WAIT

sinneduy · September 10, 2015, 8:37am

Received an alert saying that one of my nodes was down, when I tried to curl / it just hung.

Checked on the health of the cluster and the node and noted something very strange:

        "network": {
            "tcp": {
                "active_opens": 102188,
                "passive_opens": 7133683,
                "curr_estab": 205,
                "in_segs": 1483621255,
                "out_segs": 2405602124,
                "retrans_segs": 569006,
                "estab_resets": 9251,
                "attempt_fails": 3252,
                "in_errs": 11,
                "out_rsts": 23640
            }
        },

# sudo netstat -tupn |grep CLOSE_WAIT |  wc -l
11711
# sudo netstat -tupn  |  wc -l
11940

shows a ton of CLOSE_WAIT

The active opens were about 10x more on this node than any other one. What are active opens vs passive opens, and what is the expected number of active/passive opens, and how can I make elasticsearch close these connections more aggressively?

I'm running 1.7.1 on Java 1.8

{
  "status" : 200,
  "name" : <redacted>,
  "cluster_name" : <redacted>,
  "version" : {
    "number" : "1.7.1",
    "build_hash" : "b88f43fc40b0bcd7f173a1f9ee2e97816de80b19",
    "build_timestamp" : "2015-07-29T09:54:16Z",
    "build_snapshot" : false,
    "lucene_version" : "4.10.4"
  },
  "tagline" : "You Know, for Search"
}

http://elasticsearch-users.115913.n3.nabble.com/Increasing-CLOSE-WAIT-connections-and-HTTP-current-open-metric-td4019752.html

seems to be related, but it doesn't to have a conclusive solution or understanding of what is happening

Jason_Wee · November 11, 2016, 5:25am

i encounter the same situation as similar as your yesterday. today when i check again on the system monitoring history, have high number of passive_opens just before this node become unresponsive. usual passive open hang around 2,500,000 and because i written a multithreading script with 2 threads, the passive open goes to approximately 3,500,000 just before this node become unresponsive and detach from the cluster.

other metrics were also check like higher than usage for cpu usage on %user , index rate , translog operations , merge requests, cms gc activities and jvm direct pool mem usage

from these empirically, i guess the node is just too busy due to gc , merge and index and it timed out.. i can see several timeout of ping request in the log too.

hth

Topic		Replies	Views
ES with high "total_opened" Elasticsearch	3	1207	July 5, 2017
High Value of total_opened http connection in node stats when ES Cluster is idle Elasticsearch	4	1952	July 5, 2017
Increasing CLOSE_WAIT connections and HTTP current_open metric Elasticsearch	16	4291	July 6, 2017
Http.current_open Elasticsearch	2	1908	July 5, 2017
Ever-growing total_opened http connections Elasticsearch	1	1186	April 30, 2017

Elasticsearch node unresponsive, high active_opens, CLOSE_WAIT

Related topics