Hi,
I noticed a strange behavior in the cluster of Elasticsearch (v1.5.2) that we run in the company. For some of the nodes the value of http.current_open
(and process.open_file_descriptors
) reported by Node Stats is going up with time until some limit value (over 1k of http.current_open
) is hit. When that happens http.current_open
drops to more normal, one digit value. Right after that the problem reappears on different node(s).
It's best to illustrate it with a graph (we use Elasticsearch StatsD plugin with push to Graphite):
First of all I haven't found what http.current_open really stands for. I assume that the number of open incoming TCP connection for it's HTTP transport (listening on port 9200 by default). The problem is that these numbers don't match:
-
I'm first identifying the node with most
http.current_open
:jq -r '.host + " " + (.http.current_open|tostring)' | sort -rnk 2 | head -n 1 some.host.name 1254```
-
Then I'm logging to this host checking and counting these connections myself:
62```
Can someone please explain to me why these numbers don't match?
I'd also like to understand why does it happen. Our apps access the cluster via HAProxy (using round-robin) if that changes anything. It's probably worth to mention that the number of TCP connections reported by HAProxy don't match http.current_open
either.
Best,
Tomasz