Search Load balance nodes disconnect from cluster


(Jelle Smet) #1

Hi list,

I have an ES cluster of 6 nodes running version 0.90.10

4 physical nodes:

node.master: true
node.data: true

Each of these nodes have a Logstash process consuming logs from AMQP and
index the data to localhost.

2 VM nodes:

node.master: false
node.data: false

These 2 nodes have Kibana installed and function as "GUI" nodes making use
of the "search load balance" functionality ES offers.

The 2 search load balance nodes disconnect very often ( at least 2 times
per 60 minutes ) from the cluster seemingly without any reason. Traffic is
minimal 2.6GB/day (not production yet).
Network engineers have monitored connections, setup tests to detect packet
loss and other network related issues without any result.

The symptoms are:

  • All requests to the "search load balance nodes" port tcp/9200 hang.
    (most common)
  • After a while in this state http requests on all nodes hang. (rare)
  • Meanwhile indexing just continues without a problem on all 4 physical
    nodes

A log extract of one of the "search load balance" nodes is attached.

Any ideas or advice to identify and solve this problem would be appreciated.

Tnx,

Jelle

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c49d12e8-5d34-452a-84d4-5c06e7e648b2%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jelle Smet) #2

A follow up on my own post as it might be helpful for others,

I found out that a firewall in the middle was dropping open connections
with x amount of time of inactivity.
ES wasn't really happy with this apparently.

Solution:

set network.tcp.keep_alive true

Add following params to sysctl.conf:

net.ipv4.tcp_keepalive_time = 60
net.ipv4.tcp_keepalive_probes = 6
net.ipv4.tcp_keepalive_intvl = 10

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/fbd78b0a-8aba-4aba-81b9-34555340c685%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #3