Master node stops responding, but status is green locally

I have a 4 node cluster. 1 master node with data turned off, and 3 data
nodes (2 of which have the default master setting, 1 with master off).
All of our application requests are directed at the master node by name -
so we are calling - node1.myelasticsearch.com/_query...
We are also using ES head to monitor the server.
The server runs fine for a while, but then just stops responding to
requests. ES-Head no longer returns the overview page data. It does
however keeping showing the cluster as green. If I log into all nodes and
request heath status of the localhost node, it returns green. But the
application does not respond and ES-head can not show the status on the
overview page - and the structured query page doesn't load.
Then I restart ES - using kill and then starting it again - we don't have
it installed as a service - Everything starts responding again. There are
no errors in the log.
Any idea why this would be happening?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

On Thu, 2013-03-21 at 06:49 -0700, browe wrote:

I have a 4 node cluster. 1 master node with data turned off, and 3
data nodes (2 of which have the default master setting, 1 with master
off).
All of our application requests are directed at the master node by
name - so we are calling - node1.myelasticsearch.com/_query...
We are also using ES head to monitor the server.
The server runs fine for a while, but then just stops responding to
requests. ES-Head no longer returns the overview page data. It does
however keeping showing the cluster as green. If I log into all nodes
and request heath status of the localhost node, it returns green. But
the application does not respond and ES-head can not show the status
on the overview page - and the structured query page doesn't load.
Then I restart ES - using kill and then starting it again - we don't
have it installed as a service - Everything starts responding again.
There are no errors in the log.
Any idea why this would be happening?

Are you using HTTP connections without keepalive? You may be running out
of sockets. Look at netstat output for sockets in TIME_WAIT

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I'm not sure I know that we are or are not use HTTP connections without
keepalive. Are you asking about our application which is using RestClient
to make the calls, or an ES server setting somewhere that I could check.
For ES we would be using the default.
Netstat does not show time_wait on the master node right now. Should I be
checking all nodes?
And if they are in Timewat - how would I fix that?

Thank you,

Brian

On Thu, Mar 21, 2013 at 11:54 AM, Clinton Gormley clint@traveljury.comwrote:

On Thu, 2013-03-21 at 06:49 -0700, browe wrote:

I have a 4 node cluster. 1 master node with data turned off, and 3
data nodes (2 of which have the default master setting, 1 with master
off).
All of our application requests are directed at the master node by
name - so we are calling - node1.myelasticsearch.com/_query...
We are also using ES head to monitor the server.
The server runs fine for a while, but then just stops responding to
requests. ES-Head no longer returns the overview page data. It does
however keeping showing the cluster as green. If I log into all nodes
and request heath status of the localhost node, it returns green. But
the application does not respond and ES-head can not show the status
on the overview page - and the structured query page doesn't load.
Then I restart ES - using kill and then starting it again - we don't
have it installed as a service - Everything starts responding again.
There are no errors in the log.
Any idea why this would be happening?

Are you using HTTP connections without keepalive? You may be running out
of sockets. Look at netstat output for sockets in TIME_WAIT

clint

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/zKIv9Mk_YkI/unsubscribe?hl=en-US
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Sorry, to be clearer: a typical issue that stops requests getting through
to Elasticsearch, is that the HTTP client you are using does not use
keepalive, which means that it opens a new socket for every connection.
The kernel keeps old sockets around for a while, before finally clearing
them out.

There is a limit as to how many sockets can be opened, after which you have
to wait for old connections to be removed before you can open new sockets.

You are using your client to talk to the master node, so I would look for
TIME_WAITs on the master node and on the clients.

As for fixing it, you can google for settings that will change how many
open sockets your kernel will allow, or switch to using an HTTP client
which supports keepalive

clint

On Thu, Mar 21, 2013 at 7:58 PM, Brian Rowe browe@perceivant.com wrote:

I'm not sure I know that we are or are not use HTTP connections without
keepalive. Are you asking about our application which is using RestClient
to make the calls, or an ES server setting somewhere that I could check.
For ES we would be using the default.
Netstat does not show time_wait on the master node right now. Should I be
checking all nodes?
And if they are in Timewat - how would I fix that?

Thank you,

Brian

On Thu, Mar 21, 2013 at 11:54 AM, Clinton Gormley clint@traveljury.comwrote:

On Thu, 2013-03-21 at 06:49 -0700, browe wrote:

I have a 4 node cluster. 1 master node with data turned off, and 3
data nodes (2 of which have the default master setting, 1 with master
off).
All of our application requests are directed at the master node by
name - so we are calling - node1.myelasticsearch.com/_query...
We are also using ES head to monitor the server.
The server runs fine for a while, but then just stops responding to
requests. ES-Head no longer returns the overview page data. It does
however keeping showing the cluster as green. If I log into all nodes
and request heath status of the localhost node, it returns green. But
the application does not respond and ES-head can not show the status
on the overview page - and the structured query page doesn't load.
Then I restart ES - using kill and then starting it again - we don't
have it installed as a service - Everything starts responding again.
There are no errors in the log.
Any idea why this would be happening?

Are you using HTTP connections without keepalive? You may be running out
of sockets. Look at netstat output for sockets in TIME_WAIT

clint

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/zKIv9Mk_YkI/unsubscribe?hl=en-US
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.