I've noticed that over the last week the number of sockets in the CLOSE_WAIT status has grown steadily.
$ netstat -aonp | grep CLOSE_WAIT | wc -l
6484
When I look at the sockets, I can see the process id is elasticsearch and the destination port is 50010, which I looked up and is a hadoop datanode port.
Has anyone seen this before/have any ideas about how to fix this?
There are settings to reduce the amount of time that a socket stays in
this state after closing. I know on Windows this is 2 minutes and is
reg configurable. Not sure about other OSes.
However, sockets should get reused. What client are you using and if
it is rest/HTTP based, is it using keep-alive?
I've noticed that over the last week the number of sockets in the CLOSE_WAIT
status has grown steadily.
$ netstat -aonp | grep CLOSE_WAIT | wc -l
6484
When I look at the sockets, I can see the process id is elasticsearch and
the destination port is 50010, which I looked up and is a hadoop datanode
port.
Has anyone seen this before/have any ideas about how to fix this?
I am not doing anything specific on hadoop except for using the formal
API. I would think it would reuse the same socket when talking to the hadoop
cluster... . Maybe you could check a bit with hadoop and see why it might
happen?
There are settings to reduce the amount of time that a socket stays in
this state after closing. I know on Windows this is 2 minutes and is
reg configurable. Not sure about other OSes.
However, sockets should get reused. What client are you using and if
it is rest/HTTP based, is it using keep-alive?
I've noticed that over the last week the number of sockets in the
CLOSE_WAIT
status has grown steadily.
$ netstat -aonp | grep CLOSE_WAIT | wc -l
6484
When I look at the sockets, I can see the process id is elasticsearch and
the destination port is 50010, which I looked up and is a hadoop datanode
port.
Has anyone seen this before/have any ideas about how to fix this?
Our base client is based on the pyelasticsearch class found on the elasticsearch website. I've looked into this and python uses keep-alive by default in the client as far as I can tell. I'm pretty sure that it isn't the client that is the problem since the port numbers indicate that it is a problem between hadoop and elasticsearch. I'll take a look at the interface between the two and see if I can find anything, if not I'll try to find another way around it.
Yea, it looks like a CLOSED_WAIT communicating with hadoop. In
elasticsearch, a single FileSystem instance is used for the node, not sure
how hadoop internals work to tell how it manages sockets. Might be the
expected behavior?
Our base client is based on the pyelasticsearch class found on the
elasticsearch website. I've looked into this and python uses keep-alive by
default in the client as far as I can tell. I'm pretty sure that it isn't
the client that is the problem since the port numbers indicate that it is a
problem between hadoop and elasticsearch. I'll take a look at the interface
between the two and see if I can find anything, if not I'll try to find
another way around it.
It appears that something is wrong with the way elasticsearch is using the hadoop api or something is wrong with the hadoop api, since I wouldn't think having thousands of CLOSE_WAIT sockets would be expected behavior. I'll keep looking and maybe post on the hadoop forums.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.