Severe: http connection errors

I am having a frequently occurring problem that is affecting the
stability of search feature on my website.
I have elasticsearch 0.14.3 installed and running on my website. Many
times, the http communication fails- I get no response. There are no
issues with my web server because my website is continuously up and
running.
For eg. when I run this:
curl -XGET 'http://localhost:9200/my_index/_search?pretty=true' -d '
{
"query":{
"matchAll":{}
}
}'

I get no response and this is the message that I get: curl: (56)
Failure when receiving data from the peer

In my PHP script, when I try to parse the search results using
file_get_contents, it fails with this message:
file_get_contents(http://localhost:9200/my_index/_search?
q=querystring&size=20): failed to open stream: HTTP request failed!

I am having no clue as to why this is happening. The search would work
perfectly well for a few days and then this error comes up. Sometimes
I restart the apache server process on my website but am not sure if
that has a relation this issue, because the search continues to work
properly even after the restart.
I have tried restarting elasticsearch process but that doesn't help.

As of now, the only workaround I have to this problem is to stop the
elasticsearch process and remove the installed package, then
installing a fresh copy of the same package. Does that mean the binary
file is corrupted or something? But why does it get corrupted so
often?
I would be grateful for pointers to this problem.

Thanks,
Niranjan

Hi Niranjan

On Fri, 2011-12-09 at 02:28 -0800, Niranjan wrote:

I am having a frequently occurring problem that is affecting the
stability of search feature on my website.
I have elasticsearch 0.14.3 installed and running on my website. Many
times, the http communication fails- I get no response. There are no
issues with my web server because my website is continuously up and
running.
For eg. when I run this:
curl -XGET 'http://localhost:9200/my_index/_search?pretty=true' -d '
{
"query":{
"matchAll":{}
}
}'

I get no response and this is the message that I get: curl: (56)
Failure when receiving data from the peer

A few things to consider:

  1. what is in the elasticsearch logs
  2. You're using a really old version - many bugs have been fixed since
    then, consider upgrading
  3. If you're doing a lot of requests, and not using persistent HTTP
    connections, you may be running out of sockets on your OS, and
    things time out until your OS frees them up
  4. If you're not using the bootstrap.mlockall option (with the
    ES_MIN_MEM and ES_MAX_MEM set appropriately, and ulimit -l unlimited)
    then you may be running into issues with swap, which slows the
    JVM to a crawl when doing garbage collection

clint

Hi Clinton,
Thanks for your reply. My responses below:

  1. what is in the elasticsearch logs

The logs are showing two different places where elasticsearch is
failing:
(a) [2011-12-09 01:15:02,818][DEBUG][action.index ]
[Professor X] [my_index][3], node[dakvMD03QOGADfRKG82MLA], [P],
s[STARTED]: Failed to execute [index {[my_index][my_type]
[4ee121f6db1bc0aa6e000000], source[{"name": "Abraham"}]}]
org.elasticsearch.index.engine.IndexFailedEngineException:
[my_index]3] Index failed for [my_type#4ee121f6db1bc0aa6e000000]

Caused by: java.io.FileNotFoundException: /etc/elasticsearch-0.14.3/
data/elasticsearch/nodes/0/indices/my_index/3/index/_n.tvx (Too many
open files)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(RandomAccessFile.java:233)
at org.apache.lucene.store.SimpleFSDirectory
$SimpleFSIndexOutput.(SimpleFSDirectory.java:180)
at
org.apache.lucene.store.NIOFSDirectory.createOutput(NIOFSDirectory.java:
85)

    (b) [2011-12-09 02:15:01,666][WARN ]

[netty.channel.socket.nio.NioServerSocketPipelineSink] Failed to
initialize an accepted socket.
org.elasticsearch.common.netty.channel.ChannelException: Failed to
create a selector.

  1. You're using a really old version

I am stuck to this old version because somehow porter_stem is not
working on version 0.18.4 which is one of the latest versions I tried.
Somehow porter_stem is working on the 0.14.3 packageand not on the
0.18.4 package- am I missing installation of some additional packages
or something?

  1. If you're doing a lot of requests, and not using persistent HTTP
    connections

If you see the log I have posted, it says this: "Failed to initialize
an accepted socket.". Could this be due to the reason you have
mentioned in your third point?

  1. If you're not using the bootstrap.mlockall option ...

I will explore on this and see how I can include this.

Please help with your suggestions here.

Thanks,
Niranjan

On Dec 9, 3:57 pm, Clinton Gormley cl...@traveljury.com wrote:

HiNiranjan

On Fri, 2011-12-09 at 02:28 -0800,Niranjanwrote:

I am having a frequently occurring problem that is affecting the
stability of search feature on my website.
I have elasticsearch 0.14.3 installed and running on my website. Many
times, the http communication fails- I get no response. There are no
issues with my web server because my website is continuously up and
running.
For eg. when I run this:
curl -XGET 'http://localhost:9200/my_index/_search?pretty=true'-d '
{
"query":{
"matchAll":{}
}
}'

I get no response and this is the message that I get: curl: (56)
Failure when receiving data from the peer

A few things to consider:

  1. what is in the elasticsearch logs
  2. You're using a really old version - many bugs have been fixed since
    then, consider upgrading
  3. If you're doing a lot of requests, and not using persistent HTTP
    connections, you may be running out of sockets on your OS, and
    things time out until your OS frees them up
  4. If you're not using the bootstrap.mlockall option (with the
    ES_MIN_MEM and ES_MAX_MEM set appropriately, and ulimit -l unlimited)
    then you may be running into issues with swap, which slows the
    JVM to a crawl when doing garbage collection

clint

Hi Nirajan

This looks like the issue:

Caused by: java.io.FileNotFoundException: /etc/elasticsearch-0.14.3/
data/elasticsearch/nodes/0/indices/my_index/3/index/_n.tvx (Too many
open files)

You need to set eg ulimit -n 60000
or some reasonably high value.

clint

Whats not working with porter stem? If there is a problem, we can easily
fix it!

On Sat, Dec 10, 2011 at 8:23 PM, Niranjan niranjan.u@gmail.com wrote:

Hi Clinton,
Thanks for your reply. My responses below:

  1. what is in the elasticsearch logs

The logs are showing two different places where elasticsearch is
failing:
(a) [2011-12-09 01:15:02,818][DEBUG][action.index ]
[Professor X] [my_index][3], node[dakvMD03QOGADfRKG82MLA], [P],
s[STARTED]: Failed to execute [index {[my_index][my_type]
[4ee121f6db1bc0aa6e000000], source[{"name": "Abraham"}]}]
org.elasticsearch.index.engine.IndexFailedEngineException:
[my_index]3] Index failed for [my_type#4ee121f6db1bc0aa6e000000]

Caused by: java.io.FileNotFoundException: /etc/elasticsearch-0.14.3/
data/elasticsearch/nodes/0/indices/my_index/3/index/_n.tvx (Too many
open files)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(RandomAccessFile.java:233)
at org.apache.lucene.store.SimpleFSDirectory
$SimpleFSIndexOutput.(SimpleFSDirectory.java:180)
at
org.apache.lucene.store.NIOFSDirectory.createOutput(NIOFSDirectory.java:
85)

   (b) [2011-12-09 02:15:01,666][WARN ]

[netty.channel.socket.nio.NioServerSocketPipelineSink] Failed to
initialize an accepted socket.
org.elasticsearch.common.netty.channel.ChannelException: Failed to
create a selector.

  1. You're using a really old version

I am stuck to this old version because somehow porter_stem is not
working on version 0.18.4 which is one of the latest versions I tried.
Somehow porter_stem is working on the 0.14.3 packageand not on the
0.18.4 package- am I missing installation of some additional packages
or something?

  1. If you're doing a lot of requests, and not using persistent HTTP
    connections

If you see the log I have posted, it says this: "Failed to initialize
an accepted socket.". Could this be due to the reason you have
mentioned in your third point?

  1. If you're not using the bootstrap.mlockall option ...

I will explore on this and see how I can include this.

Please help with your suggestions here.

Thanks,
Niranjan

On Dec 9, 3:57 pm, Clinton Gormley cl...@traveljury.com wrote:

HiNiranjan

On Fri, 2011-12-09 at 02:28 -0800,Niranjanwrote:

I am having a frequently occurring problem that is affecting the
stability of search feature on my website.
I have elasticsearch 0.14.3 installed and running on my website. Many
times, the http communication fails- I get no response. There are no
issues with my web server because my website is continuously up and
running.
For eg. when I run this:
curl -XGET 'http://localhost:9200/my_index/_search?pretty=true'-d '
{
"query":{
"matchAll":{}
}
}'

I get no response and this is the message that I get: curl: (56)
Failure when receiving data from the peer

A few things to consider:

  1. what is in the elasticsearch logs
  2. You're using a really old version - many bugs have been fixed since
    then, consider upgrading
  3. If you're doing a lot of requests, and not using persistent HTTP
    connections, you may be running out of sockets on your OS, and
    things time out until your OS frees them up
  4. If you're not using the bootstrap.mlockall option (with the
    ES_MIN_MEM and ES_MAX_MEM set appropriately, and ulimit -l unlimited)
    then you may be running into issues with swap, which slows the
    JVM to a crawl when doing garbage collection

clint