Data loss on shutdown of node in a 3 node cluster


(Gregory Durham) #1

I am having an issue in elasticsearch 0.90.3 where I am using logstash to
insert data. My issue is that when I stop a node in the 3 node cluster, no
new data is inserted and yet the status still shows as green. Which in turn
is causing logstash to continue to send it data and elasticsearch is
essentially dropping it on the floor.

There were 3 nodes in the cluster, ip2/es02 was the master and I shut it
down. I then monitored the rest of the nodes, and master election occurred,
and a new master was selected. However, no matter what I do, no data is
flowing into the elasticsearch cluster, and there are no errors from the
logstash side of things.

Any help on how to troubleshoot this would be great!

Some just general information:

curl http://localhost:9200/_nodes?pretty=true
{
"ok" : true,
"cluster_name" : "tower3",
"nodes" : {
"idOfThree" : {
"name" : "es03",
"transport_address" : "inet[/ip3:9300]",
"hostname" : "es03",
"version" : "0.90.3",
"http_address" : "inet[/ip3:9200]",
"attributes" : {
"datacenter" : "usw"
}
},
"idOfOne" : {
"name" : "es01",
"transport_address" : "inet[/ip1:9300]",
"hostname" : "es01",
"version" : "0.90.3",
"http_address" : "inet[/ip1:9200]",
"attributes" : {
"datacenter" : "usw"
}
}
}
}

curl http://localhost:9200/_cluster/health?pretty=true
{
"cluster_name" : "logstash",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 2,
"number_of_data_nodes" : 2,
"active_primary_shards" : 5,
"active_shards" : 10,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0
}

My Elasticsearch config (somewhat modified, i.e. ips are real ips,
node.name is different, cluster.name is different) This is similar minus
the node.name being the actual hostname of the machine on all of the
servers.

##################################################################

/etc/elasticsearch/elasticsearch.yml

Base configuration for a write heavy cluster

Cluster / Node Basics

cluster.name: logstash

Node can have abritrary attributes we can use for routing

node.name: es01
node.datacenter: usw

path.data: /data/var/lib/elasticsearch/

Force all memory to be locked, forcing the JVM to never swap

bootstrap.mlockall: true

Indexing Settings for Writes

indices.memory.index_buffer_size: 50%
index.refresh_interval: 30
index.translog.flush_threshold_ops: 50000
index.store.compress.stored: true

Threadpool Settings

Search pool

threadpool.search.type: fixed
threadpool.search.size: 20
threadpool.search.queue_size: 100

Bulk pool

#threadpool.bulk.type: fixed
#threadpool.bulk.size: 60
#threadpool.bulk.queue_size: 300

Index pool

threadpool.index.type: fixed
threadpool.index.size: 60
threadpool.index.queue_size: 100

Minimum nodes alive to constitute an operational cluster (should be n/2+1

where n is the total number of nodes in a cluster)
discovery.zen.minimum_master_nodes: 2

Unicast Discovery (disable multicast)

discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: [ "ip1", "ip2", "ip3" ]

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Gregory Durham) #2

Hope this may come to some help to someone down the road so that they don't
go through the same issue I did.

After doing a deep dive into my config what I found was the following:
org.elasticsearch.discovery.zen: [Warwolves] failed to send join request to
master ... reason [org.elasticsearch.ElasticSearchTimeoutException: Timeout
waiting for task.]

What I found is since the client and server both lived on the same server
there would be contention for port 9300, which meant the client, would
start on 9301 (found this in netstat -lanp) Reading through documentation
pointed out that this is bidirectional, which meant on the other side 9301
had to be open and accessible. Since this is in AWS, I made a rule in the
security-group to allow several ports above 9300 from within and this
appears to have resolved the issue.

Thank you,
Greg

On Wednesday, October 16, 2013 1:34:01 PM UTC-7, Gregory Durham wrote:

I am having an issue in elasticsearch 0.90.3 where I am using logstash to
insert data. My issue is that when I stop a node in the 3 node cluster, no
new data is inserted and yet the status still shows as green. Which in turn
is causing logstash to continue to send it data and elasticsearch is
essentially dropping it on the floor.

There were 3 nodes in the cluster, ip2/es02 was the master and I shut it
down. I then monitored the rest of the nodes, and master election occurred,
and a new master was selected. However, no matter what I do, no data is
flowing into the elasticsearch cluster, and there are no errors from the
logstash side of things.

Any help on how to troubleshoot this would be great!

Some just general information:

curl http://localhost:9200/_nodes?pretty=true
{
"ok" : true,
"cluster_name" : "tower3",
"nodes" : {
"idOfThree" : {
"name" : "es03",
"transport_address" : "inet[/ip3:9300]",
"hostname" : "es03",
"version" : "0.90.3",
"http_address" : "inet[/ip3:9200]",
"attributes" : {
"datacenter" : "usw"
}
},
"idOfOne" : {
"name" : "es01",
"transport_address" : "inet[/ip1:9300]",
"hostname" : "es01",
"version" : "0.90.3",
"http_address" : "inet[/ip1:9200]",
"attributes" : {
"datacenter" : "usw"
}
}
}
}

curl http://localhost:9200/_cluster/health?pretty=true
{
"cluster_name" : "logstash",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 2,
"number_of_data_nodes" : 2,
"active_primary_shards" : 5,
"active_shards" : 10,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0
}

My Elasticsearch config (somewhat modified, i.e. ips are real ips,
node.name is different, cluster.name is different) This is similar minus
the node.name being the actual hostname of the machine on all of the
servers.

##################################################################

/etc/elasticsearch/elasticsearch.yml

Base configuration for a write heavy cluster

Cluster / Node Basics

cluster.name: logstash

Node can have abritrary attributes we can use for routing

node.name: es01
node.datacenter: usw

path.data: /data/var/lib/elasticsearch/

Force all memory to be locked, forcing the JVM to never swap

bootstrap.mlockall: true

Indexing Settings for Writes

indices.memory.index_buffer_size: 50%
index.refresh_interval: 30
index.translog.flush_threshold_ops: 50000
index.store.compress.stored: true

Threadpool Settings

Search pool

threadpool.search.type: fixed
threadpool.search.size: 20
threadpool.search.queue_size: 100

Bulk pool

#threadpool.bulk.type: fixed
#threadpool.bulk.size: 60
#threadpool.bulk.queue_size: 300

Index pool

threadpool.index.type: fixed
threadpool.index.size: 60
threadpool.index.queue_size: 100

Minimum nodes alive to constitute an operational cluster (should be

n/2+1 where n is the total number of nodes in a cluster)
discovery.zen.minimum_master_nodes: 2

Unicast Discovery (disable multicast)

discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: [ "ip1", "ip2", "ip3" ]

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #3