CLOSE_WAIT Sockets


(elasticsearcher) #1

Hi all,

I'm running ElasticSearch backing up to HDFS with the following elasticsearch.yml:

index:
store:
fs:
memory:
enabled: true
gateway:
type: hdfs
hdfs:
uri: hdfs://blah:54310
path: elasticsearch/gateway

I've noticed that over the last week the number of sockets in the CLOSE_WAIT status has grown steadily.

$ netstat -aonp | grep CLOSE_WAIT | wc -l
6484

When I look at the sockets, I can see the process id is elasticsearch and the destination port is 50010, which I looked up and is a hadoop datanode port.

Has anyone seen this before/have any ideas about how to fix this?

Thanks!


(ppearcy) #2

There are settings to reduce the amount of time that a socket stays in
this state after closing. I know on Windows this is 2 minutes and is
reg configurable. Not sure about other OSes.

However, sockets should get reused. What client are you using and if
it is rest/HTTP based, is it using keep-alive?

On Nov 11, 11:02 am, elasticsearcher elasticsearc...@gmail.com
wrote:

Hi all,

I'm running ElasticSearch backing up to HDFS with the following
elasticsearch.yml:

index:
store:
fs:
memory:
enabled: true
gateway:
type: hdfs
hdfs:
uri: hdfs://blah:54310
path: elasticsearch/gateway

I've noticed that over the last week the number of sockets in the CLOSE_WAIT
status has grown steadily.

$ netstat -aonp | grep CLOSE_WAIT | wc -l
6484

When I look at the sockets, I can see the process id is elasticsearch and
the destination port is 50010, which I looked up and is a hadoop datanode
port.

Has anyone seen this before/have any ideas about how to fix this?

Thanks!

View this message in context:http://elasticsearch-users.115913.n3.nabble.com/CLOSE-WAIT-Sockets-tp...
Sent from the ElasticSearch Users mailing list archive at Nabble.com.


(Shay Banon) #3

Hi,

I am not doing anything specific on hadoop except for using the formal
API. I would think it would reuse the same socket when talking to the hadoop
cluster... . Maybe you could check a bit with hadoop and see why it might
happen?

-shay.banon

On Fri, Nov 12, 2010 at 3:07 AM, Paul ppearcy@gmail.com wrote:

There are settings to reduce the amount of time that a socket stays in
this state after closing. I know on Windows this is 2 minutes and is
reg configurable. Not sure about other OSes.

However, sockets should get reused. What client are you using and if
it is rest/HTTP based, is it using keep-alive?

On Nov 11, 11:02 am, elasticsearcher elasticsearc...@gmail.com
wrote:

Hi all,

I'm running ElasticSearch backing up to HDFS with the following
elasticsearch.yml:

index:
store:
fs:
memory:
enabled: true
gateway:
type: hdfs
hdfs:
uri: hdfs://blah:54310
path: elasticsearch/gateway

I've noticed that over the last week the number of sockets in the
CLOSE_WAIT
status has grown steadily.

$ netstat -aonp | grep CLOSE_WAIT | wc -l
6484

When I look at the sockets, I can see the process id is elasticsearch and
the destination port is 50010, which I looked up and is a hadoop datanode
port.

Has anyone seen this before/have any ideas about how to fix this?

Thanks!

View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/CLOSE-WAIT-Sockets-tp...
Sent from the ElasticSearch Users mailing list archive at Nabble.com.


(elasticsearcher) #4

Our base client is based on the pyelasticsearch class found on the elasticsearch website. I've looked into this and python uses keep-alive by default in the client as far as I can tell. I'm pretty sure that it isn't the client that is the problem since the port numbers indicate that it is a problem between hadoop and elasticsearch. I'll take a look at the interface between the two and see if I can find anything, if not I'll try to find another way around it.

Thanks.


(Shay Banon) #5

Yea, it looks like a CLOSED_WAIT communicating with hadoop. In
elasticsearch, a single FileSystem instance is used for the node, not sure
how hadoop internals work to tell how it manages sockets. Might be the
expected behavior?

On Sat, Nov 13, 2010 at 12:56 AM, elasticsearcher <elasticsearcher@gmail.com

wrote:

Our base client is based on the pyelasticsearch class found on the
elasticsearch website. I've looked into this and python uses keep-alive by
default in the client as far as I can tell. I'm pretty sure that it isn't
the client that is the problem since the port numbers indicate that it is a
problem between hadoop and elasticsearch. I'll take a look at the interface
between the two and see if I can find anything, if not I'll try to find
another way around it.

Thanks.

View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/CLOSE-WAIT-Sockets-tp1884117p1892000.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.


(elasticsearcher) #6

I've found something interesting:

$ ps aux | grep elasticsearch
<10600>
$ /usr/sbin/lsof -p 10600 | grep CLOSE_WAIT
<all destination 50010>
$ netstat -lp | grep 50010
<27091>
$ /usr/sbin/lsof -p 27091 | grep CLOSE_WAIT

Confirming:
$ /usr/sbin/lsof | grep CLOSE_WAIT | grep -v 10600

It appears that something is wrong with the way elasticsearch is using the hadoop api or something is wrong with the hadoop api, since I wouldn't think having thousands of CLOSE_WAIT sockets would be expected behavior. I'll keep looking and maybe post on the hadoop forums.


(system) #7