Leaked TPC file descriptors for Hadoop Gateway?

Hi everyone,

We are using ElasticSearch to index documents that are stored in HBase, and
so I have set up ElasticSearch to use the hadoop gateway pointing to our
hdfs cluster:

gateway.type: hdfs
gateway.hdfs.uri: hdfs://:8020
gateway.hdfs.path: /elasticsearch

I've been running load tests on the system and find that after a day or
two, ES will hit its open file limit (presently set to 64,000). I've looked
at the output of 'sudo lsof -u elasticsearch' and there are many thousands
of open TCP connections in the CLOSE_WAIT state:

java 14109 elasticsearch 661u IPv6 9820568 0t0
TCP :38257->:50010 (CLOSE_WAIT)

I'm just starting to look into the problem more closely, but I found there
was a previous mailing list discussion about essentially the same issue:
http://elasticsearch-users.115913.n3.nabble.com/CLOSE-WAIT-Sockets-td1884117.html.
There wasn't any resolution, so I thought I'd ask here if anyone has seen
or resolved this problem since.

Some more pertinent information:

  • ElasticSearch version is 0.19.9
  • We use the Cloudera hadoop distribution rather than the Apache
    distribution. As a result, we cannot use the hadoop-core jar included with
    the elasticsearch-hadoop plugin. I have worked around the problem by
    prepending the locations of the correct jar files to ES_CLASSPATH in
    elasticsearch.in.sh. I have a feeling this may be related to the problem,
    but that's nothing more than a vague hunch.

Anyway, if anyone has seen similar problems or has suggestions for
debugging strategies, let me know.

Thanks,
Carl

--

Whoops, clearly the title should be "Leaked TCP file descriptors" :slight_smile:

On Thursday, September 27, 2012 3:56:44 PM UTC-7, Carl C wrote:

Hi everyone,

We are using Elasticsearch to index documents that are stored in HBase,
and so I have set up Elasticsearch to use the hadoop gateway pointing to
our hdfs cluster:

gateway.type: hdfs
gateway.hdfs.uri: hdfs://:8020
gateway.hdfs.path: /elasticsearch

I've been running load tests on the system and find that after a day or
two, ES will hit its open file limit (presently set to 64,000). I've looked
at the output of 'sudo lsof -u elasticsearch' and there are many thousands
of open TCP connections in the CLOSE_WAIT state:

java 14109 elasticsearch 661u IPv6 9820568 0t0
TCP :38257->:50010 (CLOSE_WAIT)

I'm just starting to look into the problem more closely, but I found
there was a previous mailing list discussion about essentially the same
issue:
http://elasticsearch-users.115913.n3.nabble.com/CLOSE-WAIT-Sockets-td1884117.html.
There wasn't any resolution, so I thought I'd ask here if anyone has seen
or resolved this problem since.

Some more pertinent information:

  • Elasticsearch version is 0.19.9
  • We use the Cloudera hadoop distribution rather than the Apache
    distribution. As a result, we cannot use the hadoop-core jar included with
    the elasticsearch-hadoop plugin. I have worked around the problem by
    prepending the locations of the correct jar files to ES_CLASSPATH in
    elasticsearch.in.sh. I have a feeling this may be related to the problem,
    but that's nothing more than a vague hunch.

Anyway, if anyone has seen similar problems or has suggestions for
debugging strategies, let me know.

Thanks,
Carl

--