Sudden data loss!

Hi All,

I have a elastic search cluster of 2 nodes in my staging environment.

both the nodes have following config;

cluster.name: Staging
node.name: "node1"
index.number_of_shards: 5
index.number_of_replicas: 0
discovery.zen.minimum_master_nodes: 2
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["node1-ip", "node2-ip"]
cluster.routing.allocation.node_initial_primaries_recoveries: 8
network.bind_host: node1-ip
network.host: node1-ip
path.data: /mnt/common/es/data,/ebs1/common/es/data
path.work: /mnt/common/es/work
path.logs: /mnt/common/es/logs

The cluster was running fine from months. Since the staging environment is
hosted on AWS, we had to occasionally restart the nodes in case of network
break. And some times we used to have the unassigned shards, which we
manually used to delete by api call.

Today all of a sudden I saw no data in the data directory.
I looked at the logs and it said dangling index scheduled to delete in 2
hrs and deleting dangling index. And it deleted all the dangling indices.

This is extremely scary. We lost all our data and if it happens on
Production we are literally dead. We take backup for production env using
rsync which also can get deleted during synch post dangling deletion.

Would appreciate you advice.
I understand there is no way to recover the data. But what I am interested
to know is why it happened.

Note: 1) There are no other cluster with same name. 2) No trace of nodes
going down.

Thanks
Amit

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

If you are running a old version of ES, you might be a victim of this :

It was corrected in the latest versions.

Le vendredi 12 juillet 2013 14:05:15 UTC+2, Amit Singh a écrit :

Hi All,

I have a Elasticsearch cluster of 2 nodes in my staging environment.

both the nodes have following config;

cluster.name: Staging
node.name: "node1"
index.number_of_shards: 5
index.number_of_replicas: 0
discovery.zen.minimum_master_nodes: 2
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["node1-ip", "node2-ip"]
cluster.routing.allocation.node_initial_primaries_recoveries: 8
network.bind_host: node1-ip
network.host: node1-ip
path.data: /mnt/common/es/data,/ebs1/common/es/data
path.work: /mnt/common/es/work
path.logs: /mnt/common/es/logs

The cluster was running fine from months. Since the staging environment is
hosted on AWS, we had to occasionally restart the nodes in case of network
break. And some times we used to have the unassigned shards, which we
manually used to delete by api call.

Today all of a sudden I saw no data in the data directory.
I looked at the logs and it said dangling index scheduled to delete in 2
hrs and deleting dangling index. And it deleted all the dangling indices.

This is extremely scary. We lost all our data and if it happens on
Production we are literally dead. We take backup for production env using
rsync which also can get deleted during synch post dangling deletion.

Would appreciate you advice.
I understand there is no way to recover the data. But what I am interested
to know is why it happened.

Note: 1) There are no other cluster with same name. 2) No trace of
nodes going down.

Thanks
Amit

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thanks DH,

This is not the case with us. We are using Elasticsearch vs 0.19.3. I
looked through all the logs files to find out, any trace of FileNotFoundException
exception.

Rather I see this in the logs;

[2013-07-09 05:47:02,681][INFO ][gateway ] [node2]
recovered [2777] indices into cluster_state
[2013-07-09 06:50:27,511][WARN ][indices.cluster ] [node2]
[50221434e4b06935a1f9a33950641a6de4b09ab8959ec4ae1][2] failed to
start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[50221434e4b06935a1f9a33950641a6de4b09ab8959ec4ae1][2] shard alloca
ted for local recovery (post api), should exists, but doesn't
at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:108)
at
org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:177)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)[2013-07-09
06:50:27,824][WARN ][cluster.action.shard ] [node2] sending failed
shard for [50221434e4b06935a1f9a33950641a6de4b09ab8959ec4ae1][2],
node[9NR43cSZS7OdmTlU8Bm59A], [P], s[INITIALIZING], reason [Failed to start
shard, message
[IndexShardGatewayRecoveryException[[50221434e4b06935a1f9a33950641a6de4b09ab8959ec4ae1][2]
shard allocated for local recovery (post api), should exists, but doesn't]]
][2013-07-09 06:50:27,824][WARN ][cluster.action.shard ] [node2]
received shard failed for [50221434e4b06935a1f9a33950641a6de4b09ab
8959ec4ae
1][2], node[9NR43cSZS7OdmTlU8Bm59A], [P], s[INITIALIZING],
reason [Failed to start shard, message
[IndexShardGatewayRecoveryException[[50221434e4b06935a1f9a33950641a6de4b09ab8959ec4ae1][2]
shard allocated for local recovery (post api), should exists, but doesn't]
]]
[2013-07-09 06:50:31,255][WARN ][indices.cluster ] [node2]
[50221434e4b06935a1f9a33950641a6de4b09ab8959ec4ae1][2] failed to
start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[50221434e4b06935a1f9a33950641a6de4b09ab8959ec4ae1][2] shard allocated
for local recovery (post api), should exists, but doesn't
at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:108)
at
org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:177)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)

Which I think is expected as when I last restarted one of the node , some
of the shards got unassigned.

Further current logs;

[node1] [50fe6248e4b00edaf15be31150fe82e3e4b0d177eacd9b1042] deleting
dangling index
[2013-07-11 15:40:25,407][INFO ][indices.store ] [node1]
[50fe6248e4b00edaf15be31150fe82e3e4b0d177eacd9b1024] deleting dangling
index
[2013-07-11 15:40:25,707][INFO ][indices.store ] [node1]
[50fe6248e4b00edaf15be31150fe82e3e4b0d177eacd9b1025] deleting dangling
index
[2013-07-11 15:40:26,607][INFO ][indices.store ] [node1]
[50fe6248e4b00edaf15be31150fe82e3e4b0d177eacd9b1022] deleting dangling
index
[2013-07-11 15:40:27,047][INFO ][indices.store ] [node1]
[50fe6248e4b00edaf15be31150fe82e3e4b0d177eacd9b1023] deleting dangling
index
[2013-07-11 15:40:27,947][INFO ][indices.store ] [node1]
[50fe6248e4b00edaf15be31150fe82e3e4b0d177eacd9b1028] deleting dangling
index
[2013-07-11 15:40:28,197][INFO ][indices.store ] [node1]
[50fe6248e4b00edaf15be31150fe82e3e4b0d177eacd9b1029] deleting dangling
index
.
.
.
.
.

Thanks
Amit

On Friday, July 12, 2013 5:35:15 PM UTC+5:30, Amit Singh wrote:

Hi All,

I have a Elasticsearch cluster of 2 nodes in my staging environment.

both the nodes have following config;

cluster.name: Staging
node.name: "node1"
index.number_of_shards: 5
index.number_of_replicas: 0
discovery.zen.minimum_master_nodes: 2
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["node1-ip", "node2-ip"]
cluster.routing.allocation.node_initial_primaries_recoveries: 8
network.bind_host: node1-ip
network.host: node1-ip
path.data: /mnt/common/es/data,/ebs1/common/es/data
path.work: /mnt/common/es/work
path.logs: /mnt/common/es/logs

The cluster was running fine from months. Since the staging environment is
hosted on AWS, we had to occasionally restart the nodes in case of network
break. And some times we used to have the unassigned shards, which we
manually used to delete by api call.

Today all of a sudden I saw no data in the data directory.
I looked at the logs and it said dangling index scheduled to delete in 2
hrs and deleting dangling index. And it deleted all the dangling indices.

This is extremely scary. We lost all our data and if it happens on
Production we are literally dead. We take backup for production env using
rsync which also can get deleted during synch post dangling deletion.

Would appreciate you advice.
I understand there is no way to recover the data. But what I am interested
to know is why it happened.

Note: 1) There are no other cluster with same name. 2) No trace of
nodes going down.

Thanks
Amit

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi,

are you using EBS volumes or just the internal drives? You could check the
uptime of the instances (on the machine, not in the EC2 console!), we had
the effect this week that 4 instances got rebootet from Amazon. In such a
case if you dont use EBS your data is gone.

Andrej

Am Freitag, 12. Juli 2013 14:05:15 UTC+2 schrieb Amit Singh:

Hi All,

I have a Elasticsearch cluster of 2 nodes in my staging environment.

both the nodes have following config;

cluster.name: Staging
node.name: "node1"
index.number_of_shards: 5
index.number_of_replicas: 0
discovery.zen.minimum_master_nodes: 2
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["node1-ip", "node2-ip"]
cluster.routing.allocation.node_initial_primaries_recoveries: 8
network.bind_host: node1-ip
network.host: node1-ip
path.data: /mnt/common/es/data,/ebs1/common/es/data
path.work: /mnt/common/es/work
path.logs: /mnt/common/es/logs

The cluster was running fine from months. Since the staging environment is
hosted on AWS, we had to occasionally restart the nodes in case of network
break. And some times we used to have the unassigned shards, which we
manually used to delete by api call.

Today all of a sudden I saw no data in the data directory.
I looked at the logs and it said dangling index scheduled to delete in 2
hrs and deleting dangling index. And it deleted all the dangling indices.

This is extremely scary. We lost all our data and if it happens on
Production we are literally dead. We take backup for production env using
rsync which also can get deleted during synch post dangling deletion.

Would appreciate you advice.
I understand there is no way to recover the data. But what I am interested
to know is why it happened.

Note: 1) There are no other cluster with same name. 2) No trace of
nodes going down.

Thanks
Amit

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

We are using the internal drives and not EBS on staging. We use EBS on
production.

We have verified there is no rebooting of the instance , the system has
uptime of 350+ days. There are other softwares on the same machine and they
are working fine, no data loss for them. So this looks like an Elastic
Search issue.

Thanks
Amit

On Friday, July 12, 2013 5:35:15 PM UTC+5:30, Amit Singh wrote:

Hi All,

I have a Elasticsearch cluster of 2 nodes in my staging environment.

both the nodes have following config;

cluster.name: Staging
node.name: "node1"
index.number_of_shards: 5
index.number_of_replicas: 0
discovery.zen.minimum_master_nodes: 2
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["node1-ip", "node2-ip"]
cluster.routing.allocation.node_initial_primaries_recoveries: 8
network.bind_host: node1-ip
network.host: node1-ip
path.data: /mnt/common/es/data,/ebs1/common/es/data
path.work: /mnt/common/es/work
path.logs: /mnt/common/es/logs

The cluster was running fine from months. Since the staging environment is
hosted on AWS, we had to occasionally restart the nodes in case of network
break. And some times we used to have the unassigned shards, which we
manually used to delete by api call.

Today all of a sudden I saw no data in the data directory.
I looked at the logs and it said dangling index scheduled to delete in 2
hrs and deleting dangling index. And it deleted all the dangling indices.

This is extremely scary. We lost all our data and if it happens on
Production we are literally dead. We take backup for production env using
rsync which also can get deleted during synch post dangling deletion.

Would appreciate you advice.
I understand there is no way to recover the data. But what I am interested
to know is why it happened.

Note: 1) There are no other cluster with same name. 2) No trace of
nodes going down.

Thanks
Amit

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hello Amit,

I think upgrading ES should solve the problem on its own:

Otherwise, setting indices.store.dangling_timeout to -1 should at least
prevent you from losing data.

Best regards,
Radu

On Fri, Jul 12, 2013 at 4:28 PM, Amit Singh amitsingh.kec@gmail.com wrote:

We are using the internal drives and not EBS on staging. We use EBS on
production.

We have verified there is no rebooting of the instance , the system has
uptime of 350+ days. There are other softwares on the same machine and they
are working fine, no data loss for them. So this looks like an Elastic
Search issue.

Thanks
Amit

On Friday, July 12, 2013 5:35:15 PM UTC+5:30, Amit Singh wrote:

Hi All,

I have a Elasticsearch cluster of 2 nodes in my staging environment.

both the nodes have following config;

cluster.name: Staging
node.name: "node1"
index.number_of_shards: 5
index.number_of_replicas: 0
discovery.zen.minimum_master_**nodes: 2
discovery.zen.ping.multicast.**enabled: false
discovery.zen.ping.unicast.**hosts: ["node1-ip", "node2-ip"]
cluster.routing.allocation.**node_initial_primaries_**recoveries: 8
network.bind_host: node1-ip
network.host: node1-ip
path.data: /mnt/common/es/data,/ebs1/**common/es/data
path.work: /mnt/common/es/work
path.logs: /mnt/common/es/logs

The cluster was running fine from months. Since the staging environment
is hosted on AWS, we had to occasionally restart the nodes in case of
network break. And some times we used to have the unassigned shards, which
we manually used to delete by api call.

Today all of a sudden I saw no data in the data directory.
I looked at the logs and it said dangling index scheduled to delete in 2
hrs and deleting dangling index. And it deleted all the dangling indices.

This is extremely scary. We lost all our data and if it happens on
Production we are literally dead. We take backup for production env using
rsync which also can get deleted during synch post dangling deletion.

Would appreciate you advice.
I understand there is no way to recover the data. But what I am
interested to know is why it happened.

Note: 1) There are no other cluster with same name. 2) No trace of
nodes going down.

Thanks
Amit

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
http://sematext.com/ -- Elasticsearch -- Solr -- Lucene

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Radu,

We have plans to upgrading to the latest version of Elastic Search in near
future but not immediately.

We have two nodes in cluster and minimum master nodes setting as two. I
have restarted one of the node few days back because of network disconnect.
Hence it cannot have local indices, which will become dangling after
joining the cluster. I am not able to understand in a running cluster, what
trigged all the indices to be dangled.

I would like to understand the cause, as this could happen in production!

Thanks
Amit

On Friday, July 12, 2013 5:35:15 PM UTC+5:30, Amit Singh wrote:

Hi All,

I have a Elasticsearch cluster of 2 nodes in my staging environment.

both the nodes have following config;

cluster.name: Staging
node.name: "node1"
index.number_of_shards: 5
index.number_of_replicas: 0
discovery.zen.minimum_master_nodes: 2
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["node1-ip", "node2-ip"]
cluster.routing.allocation.node_initial_primaries_recoveries: 8
network.bind_host: node1-ip
network.host: node1-ip
path.data: /mnt/common/es/data,/ebs1/common/es/data
path.work: /mnt/common/es/work
path.logs: /mnt/common/es/logs

The cluster was running fine from months. Since the staging environment is
hosted on AWS, we had to occasionally restart the nodes in case of network
break. And some times we used to have the unassigned shards, which we
manually used to delete by api call.

Today all of a sudden I saw no data in the data directory.
I looked at the logs and it said dangling index scheduled to delete in 2
hrs and deleting dangling index. And it deleted all the dangling indices.

This is extremely scary. We lost all our data and if it happens on
Production we are literally dead. We take backup for production env using
rsync which also can get deleted during synch post dangling deletion.

Would appreciate you advice.
I understand there is no way to recover the data. But what I am interested
to know is why it happened.

Note: 1) There are no other cluster with same name. 2) No trace of
nodes going down.

Thanks
Amit

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Shay,

Sorry to address this issue to you, as you may be occupied with other high
priority work. But we are not able to get a clue of what happened to the
cluster and sudden data loss. What triggered the dandling indices and
deletion of them.
Since we have only two nodes in staging, when one of the node goes away
because of network issue. And there is still data loading happening for
some indices. I assume we may see some unassigned/corrupted shards in the
cluster. But why all of them got dangled and deleted.

What the best policies of disaster recovery management in Elastic Search?
This would delete the indices in backup as well as we do a rsync.

Thanks
Amit

On Friday, July 12, 2013 5:35:15 PM UTC+5:30, Amit Singh wrote:

Hi All,

I have a Elasticsearch cluster of 2 nodes in my staging environment.

both the nodes have following config;

cluster.name: Staging
node.name: "node1"
index.number_of_shards: 5
index.number_of_replicas: 0
discovery.zen.minimum_master_nodes: 2
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["node1-ip", "node2-ip"]
cluster.routing.allocation.node_initial_primaries_recoveries: 8
network.bind_host: node1-ip
network.host: node1-ip
path.data: /mnt/common/es/data,/ebs1/common/es/data
path.work: /mnt/common/es/work
path.logs: /mnt/common/es/logs

The cluster was running fine from months. Since the staging environment is
hosted on AWS, we had to occasionally restart the nodes in case of network
break. And some times we used to have the unassigned shards, which we
manually used to delete by api call.

Today all of a sudden I saw no data in the data directory.
I looked at the logs and it said dangling index scheduled to delete in 2
hrs and deleting dangling index. And it deleted all the dangling indices.

This is extremely scary. We lost all our data and if it happens on
Production we are literally dead. We take backup for production env using
rsync which also can get deleted during synch post dangling deletion.

Would appreciate you advice.
I understand there is no way to recover the data. But what I am interested
to know is why it happened.

Note: 1) There are no other cluster with same name. 2) No trace of
nodes going down.

Thanks
Amit

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I hate to be blunt, but if you are worried about data loss, you would have
replicas. I wouldn't run any database without some sort of data replication.

--
Ivan

On Fri, Jul 12, 2013 at 12:07 PM, Amit Singh amitsingh.kec@gmail.comwrote:

Hi Shay,

Sorry to address this issue to you, as you may be occupied with other high
priority work. But we are not able to get a clue of what happened to the
cluster and sudden data loss. What triggered the dandling indices and
deletion of them.
Since we have only two nodes in staging, when one of the node goes away
because of network issue. And there is still data loading happening for
some indices. I assume we may see some unassigned/corrupted shards in the
cluster. But why all of them got dangled and deleted.

What the best policies of disaster recovery management in Elastic Search?
This would delete the indices in backup as well as we do a rsync.

Thanks
Amit

On Friday, July 12, 2013 5:35:15 PM UTC+5:30, Amit Singh wrote:

Hi All,

I have a Elasticsearch cluster of 2 nodes in my staging environment.

both the nodes have following config;

cluster.name: Staging
node.name: "node1"
index.number_of_shards: 5
index.number_of_replicas: 0
discovery.zen.minimum_master_**nodes: 2
discovery.zen.ping.multicast.**enabled: false
discovery.zen.ping.unicast.**hosts: ["node1-ip", "node2-ip"]
cluster.routing.allocation.**node_initial_primaries_**recoveries: 8
network.bind_host: node1-ip
network.host: node1-ip
path.data: /mnt/common/es/data,/ebs1/**common/es/data
path.work: /mnt/common/es/work
path.logs: /mnt/common/es/logs

The cluster was running fine from months. Since the staging environment
is hosted on AWS, we had to occasionally restart the nodes in case of
network break. And some times we used to have the unassigned shards, which
we manually used to delete by api call.

Today all of a sudden I saw no data in the data directory.
I looked at the logs and it said dangling index scheduled to delete in 2
hrs and deleting dangling index. And it deleted all the dangling indices.

This is extremely scary. We lost all our data and if it happens on
Production we are literally dead. We take backup for production env using
rsync which also can get deleted during synch post dangling deletion.

Would appreciate you advice.
I understand there is no way to recover the data. But what I am
interested to know is why it happened.

Note: 1) There are no other cluster with same name. 2) No trace of
nodes going down.

Thanks
Amit

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Must agree with Ivan, and sorry for being impolite.

Amit, ES has replica level 1 active by default for very good reason. By
setting replica level 0, you were fully aware to opt out data recovery
and you were accepting data loss.

ES itself provides only a simple, Lucene-based shard index checker for
data repair. But this is only an emergency tool for file system crash or
lost files and no guarantuee to recover anything.

Amit, you have the following issues:

  • a 2-node cluster is never enough for creating an ES cluster. These
    mini-clusters are naturally exposed for split brains by just taking one
    node down and restarting it without having network connection to the
    running node. The setting minimum_master_nodes will not work in this case.

  • ES does not tell you if there is a split brain. There is a cluster
    state (red, yellow, green). You must strictly obey the cluster state in
    your clients while indexing and immediately stop indexing when cluster
    is not green, or if operations failed. Any data you push into a cluster
    by being not green is risky and the risk is they can't be
    replayed/recovered by an ES cluster later.

  • dangling indexes are the consequence of two masters alive and
    connected. One master will propose to delete the indexes while the other
    may not. This is random because there is no byzantine fault tolerance in
    ES - the masters will conflict in everything they do and they will not
    agree about the cluster state. There is no known method to recover a
    split master node to join a partial cluster again. And letting the
    masters continue to coexist is highly probable that you let them delete
    all the indexes that were dangling after a delay, at random.

Another point is that ES default network timeouts are quite short for
systems that must operate in heavily loaded network enviroments (VMs,
EBS) so first config setting is to increase the network tcp timeouts to
mitigate the risk of node disconnects.

Maybe one chapter on the ES doc pages should describe the pros and cons
of replica setup and the consequences of data loss and dangling indexes
more clearly since many users seem to neglect the importance of creating
a fault tolerance replica-based setup.

Jörg

Am 13.07.13 09:27, schrieb Ivan Brusic:

I hate to be blunt, but if you are worried about data loss, you would
have replicas. I wouldn't run any database without some sort of
data replication.

--
Ivan

On Fri, Jul 12, 2013 at 12:07 PM, Amit Singh <amitsingh.kec@gmail.com
mailto:amitsingh.kec@gmail.com> wrote:

Hi Shay,

Sorry to address this issue to you, as you may be occupied with
other high priority work. But we are not able to get a clue of
what happened to the cluster and sudden data loss. What triggered
the dandling indices and deletion of them.
Since we have only two nodes in staging, when one of the node goes
away because of network issue. And there is still data loading
happening for some indices. I assume we may see some
unassigned/corrupted shards in the cluster. But why all of them
got dangled and deleted.

What the best policies of disaster recovery management in Elastic
Search? This would delete the indices in backup as well as we do a
rsync.

Thanks
Amit

On Friday, July 12, 2013 5:35:15 PM UTC+5:30, Amit Singh wrote:

    Hi All,

    I have a elastic search cluster of 2 nodes in my staging
    environment.

    both the nodes have following config;

    cluster.name <http://cluster.name>: Staging
    node.name <http://node.name>: "node1"
    index.number_of_shards: 5
    index.number_of_replicas: 0
    discovery.zen.minimum_master_nodes: 2
    discovery.zen.ping.multicast.enabled: false
    discovery.zen.ping.unicast.hosts: ["node1-ip", "node2-ip"]
    cluster.routing.allocation.node_initial_primaries_recoveries: 8
    network.bind_host: node1-ip
    network.host: node1-ip
    path.data: /mnt/common/es/data,/ebs1/common/es/data
    path.work: /mnt/common/es/work
    path.logs: /mnt/common/es/logs


    The cluster was running fine from months. Since the staging
    environment is hosted on AWS, we had to occasionally restart
    the nodes in case of network break. And some times we used to
    have the unassigned shards, which we manually used to delete
    by api call.

    Today all of a sudden I saw no data in the data directory.
    I looked at the logs and it said dangling index scheduled to
    delete in 2 hrs and deleting dangling index. And it deleted
    all the dangling indices.

    This is extremely scary. We lost all our data and if it
    happens on Production we are literally dead. We take backup
    for production env using rsync which also can get deleted
    during synch post dangling deletion.

    Would appreciate you advice.
    I understand there is no way to recover the data. But what I
    am interested to know is why it happened.

    *Note*: 1) There are no other cluster with same name. 2) No
    trace of nodes going down.

    Thanks
    Amit



-- 
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearch+unsubscribe@googlegroups.com
<mailto:elasticsearch%2Bunsubscribe@googlegroups.com>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hey Ivan and Jorg,

Thanks for your responses. I prefer one to be blunt and respond than
keeping quiet. That helps my cause :). So don't worry about being blunt!!

On Friday, July 12, 2013 5:35:15 PM UTC+5:30, Amit Singh wrote:

Hi All,

I have a Elasticsearch cluster of 2 nodes in my staging environment.

both the nodes have following config;

cluster.name: Staging
node.name: "node1"
index.number_of_shards: 5
index.number_of_replicas: 0
discovery.zen.minimum_master_nodes: 2
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["node1-ip", "node2-ip"]
cluster.routing.allocation.node_initial_primaries_recoveries: 8
network.bind_host: node1-ip
network.host: node1-ip
path.data: /mnt/common/es/data,/ebs1/common/es/data
path.work: /mnt/common/es/work
path.logs: /mnt/common/es/logs

The cluster was running fine from months. Since the staging environment is
hosted on AWS, we had to occasionally restart the nodes in case of network
break. And some times we used to have the unassigned shards, which we
manually used to delete by api call.

Today all of a sudden I saw no data in the data directory.
I looked at the logs and it said dangling index scheduled to delete in 2
hrs and deleting dangling index. And it deleted all the dangling indices.

This is extremely scary. We lost all our data and if it happens on
Production we are literally dead. We take backup for production env using
rsync which also can get deleted during synch post dangling deletion.

Would appreciate you advice.
I understand there is no way to recover the data. But what I am interested
to know is why it happened.

Note: 1) There are no other cluster with same name. 2) No trace of
nodes going down.

Thanks
Amit

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

My further addition,

responses in line (Blue text)--->

On Sat, Jul 13, 2013 at 1:37 PM, Jörg Prante joergprante@gmail.com wrote:

Must agree with Ivan, and sorry for being impolite.

Amit, ES has replica level 1 active by default for very good reason. By
setting replica level 0, you were fully aware to opt out data recovery and
you were accepting data loss.

--->> Fully aware of this fact! I am fine if I loose a node and because of
which I loose data. But I am struggling to find why it would happen
otherwise. Another education I would need is, how a replica will help the
cause. Do you mean if I have a replica it will check against that before
sending a delete request or it will delete the replica as well? If it
checks the replica beforehand then make sense. Another related question is
if a shard/index gets corrupted (Yet to find actual reason why it gets
corrupted) what is the chance of the replica getting corrupted as well. I
experienced this earlier couple of times and hence I could not find much fo
importance of replicas apart from the fact that it can serve additional
search requests. Some sort of education will definitely help here.

ES itself provides only a simple, Lucene-based shard index checker for
data repair. But this is only an emergency tool for file system crash or
lost files and no guarantuee to recover anything.

Amit, you have the following issues:

  • a 2-node cluster is never enough for creating an ES cluster. These
    mini-clusters are naturally exposed for split brains by just taking one
    node down and restarting it without having network connection to the
    running node. The setting minimum_master_nodes will not work in this case.

--->> After bringing up the second node (which went down) the cluster
state showed clearly that the second node has joined. So do you mean the
cluster state api can show false message and not always reliable?

  • ES does not tell you if there is a split brain. There is a cluster state
    (red, yellow, green). You must strictly obey the cluster state in your
    clients while indexing and immediately stop indexing when cluster is not
    green, or if operations failed. Any data you push into a cluster by being
    not green is risky and the risk is they can't be replayed/recovered by an
    ES cluster later.

---> Great suggestion. Will implement this. But if the cluster states are
not reliable (based on the pervious explanation) then it will defeat the
purpose. There has to be something that we should be able to rely on. If
its the cluster state then I clearly remember the cluster state shows 2
nodes.

  • dangling indexes are the consequence of two masters alive and connected.
    One master will propose to delete the indexes while the other may not. This
    is random because there is no byzantine fault tolerance in ES - the masters
    will conflict in everything they do and they will not agree about the
    cluster state. There is no known method to recover a split master node to
    join a partial cluster again. And letting the masters continue to coexist
    is highly probable that you let them delete all the indexes that were
    dangling after a delay, at random.

---> Is there any api to find how many masters alive in a cluster? Probably
that needs to be considered as well. Can there be a situation where the
masters meta information gets corrupted or lost? Where does ES keep the
meta information. If so can this trigger delete request? Looking at logs I
could see that after one hour of bringing the 2nd node up it starts showing
shard recovery exception and then after few hours along with shard recovery
exception it starts delete index messages. Also start showing dangling
index messages. Seems like a multi organ failure scenario :). If 2 master
is alive won't their meta information be synched and will it stil harm the
old indices which are not getting indexed that point in tim ?

Another point is that ES default network timeouts are quite short for
systems that must operate in heavily loaded network enviroments (VMs, EBS)
so first config setting is to increase the network tcp timeouts to mitigate
the risk of node disconnects.

-->>Yes make sense -- this property right? [discovery.zen.ping.timeout]

Maybe one chapter on the ES doc pages should describe the pros and cons of
replica setup and the consequences of data loss and dangling indexes more
clearly since many users seem to neglect the importance of creating a fault
tolerance replica-based setup.

---> Yes very much required!!

Thanks
Amit

On Friday, July 12, 2013 5:35:15 PM UTC+5:30, Amit Singh wrote:

Hi All,

I have a Elasticsearch cluster of 2 nodes in my staging environment.

both the nodes have following config;

cluster.name: Staging
node.name: "node1"
index.number_of_shards: 5
index.number_of_replicas: 0
discovery.zen.minimum_master_nodes: 2
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["node1-ip", "node2-ip"]
cluster.routing.allocation.node_initial_primaries_recoveries: 8
network.bind_host: node1-ip
network.host: node1-ip
path.data: /mnt/common/es/data,/ebs1/common/es/data
path.work: /mnt/common/es/work
path.logs: /mnt/common/es/logs

The cluster was running fine from months. Since the staging environment is
hosted on AWS, we had to occasionally restart the nodes in case of network
break. And some times we used to have the unassigned shards, which we
manually used to delete by api call.

Today all of a sudden I saw no data in the data directory.
I looked at the logs and it said dangling index scheduled to delete in 2
hrs and deleting dangling index. And it deleted all the dangling indices.

This is extremely scary. We lost all our data and if it happens on
Production we are literally dead. We take backup for production env using
rsync which also can get deleted during synch post dangling deletion.

Would appreciate you advice.
I understand there is no way to recover the data. But what I am interested
to know is why it happened.

Note: 1) There are no other cluster with same name. 2) No trace of
nodes going down.

Thanks
Amit

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Amit

You did indeed fall foul of the bug in Lucene which deletes shards: this
message in the logs indicates that: "shard allocated for local recovery
(post api), should exists, but doesn't"

  • a 2-node cluster is never enough for creating an ES cluster. These
    mini-clusters are naturally exposed for split brains by just taking one
    node down and restarting it without having network connection to the
    running node. The setting minimum_master_nodes will not work in this case.

--->> After bringing up the second node (which went down) the cluster
state showed clearly that the second node has joined. So do you mean the
cluster state api can show false message and not always reliable?

If for some reason, your nodes stop being able to see each other, both
nodes will think that the other node has disappeared and each node will
form a new cluster. Then you will have a split brain. The
minimum_master_nodes setting says "if you don't see at least this many
master-eligible nodes, then don't try to form a cluster". So if you have 3
nodes, and minimum_master_nodes set to 2, then if one node disconnects, it
won't see enough master eligible nodes, and won't form a cluster by itself.
Instead it will keep looking for a cluster to join.

clint

On 13 July 2013 20:02, Amit Singh amitsingh.kec@gmail.com wrote:

My further addition,

responses in line (Blue text)--->

On Sat, Jul 13, 2013 at 1:37 PM, Jörg Prante joergprante@gmail.com
wrote:

Must agree with Ivan, and sorry for being impolite.

Amit, ES has replica level 1 active by default for very good reason. By
setting replica level 0, you were fully aware to opt out data recovery and
you were accepting data loss.

--->> Fully aware of this fact! I am fine if I loose a node and because
of which I loose data. But I am struggling to find why it would happen
otherwise. Another education I would need is, how a replica will help the
cause. Do you mean if I have a replica it will check against that before
sending a delete request or it will delete the replica as well? If it
checks the replica beforehand then make sense. Another related question is
if a shard/index gets corrupted (Yet to find actual reason why it gets
corrupted) what is the chance of the replica getting corrupted as well. I
experienced this earlier couple of times and hence I could not find much fo
importance of replicas apart from the fact that it can serve additional
search requests. Some sort of education will definitely help here.

ES itself provides only a simple, Lucene-based shard index checker for
data repair. But this is only an emergency tool for file system crash or
lost files and no guarantuee to recover anything.

Amit, you have the following issues:

  • a 2-node cluster is never enough for creating an ES cluster. These
    mini-clusters are naturally exposed for split brains by just taking one
    node down and restarting it without having network connection to the
    running node. The setting minimum_master_nodes will not work in this case.

--->> After bringing up the second node (which went down) the cluster
state showed clearly that the second node has joined. So do you mean the
cluster state api can show false message and not always reliable?

  • ES does not tell you if there is a split brain. There is a cluster
    state (red, yellow, green). You must strictly obey the cluster state in
    your clients while indexing and immediately stop indexing when cluster is
    not green, or if operations failed. Any data you push into a cluster by
    being not green is risky and the risk is they can't be replayed/recovered
    by an ES cluster later.

---> Great suggestion. Will implement this. But if the cluster states are
not reliable (based on the pervious explanation) then it will defeat the
purpose. There has to be something that we should be able to rely on. If
its the cluster state then I clearly remember the cluster state shows 2
nodes.

  • dangling indexes are the consequence of two masters alive and
    connected. One master will propose to delete the indexes while the other
    may not. This is random because there is no byzantine fault tolerance in ES
  • the masters will conflict in everything they do and they will not agree
    about the cluster state. There is no known method to recover a split master
    node to join a partial cluster again. And letting the masters continue to
    coexist is highly probable that you let them delete all the indexes that
    were dangling after a delay, at random.

---> Is there any api to find how many masters alive in a cluster?
Probably that needs to be considered as well. Can there be a situation
where the masters meta information gets corrupted or lost? Where does ES
keep the meta information. If so can this trigger delete request? Looking
at logs I could see that after one hour of bringing the 2nd node up it
starts showing shard recovery exception and then after few hours along with
shard recovery exception it starts delete index messages. Also start
showing dangling index messages. Seems like a multi organ failure scenario
:). If 2 master is alive won't their meta information be synched and will
it stil harm the old indices which are not getting indexed that point in
tim ?

Another point is that ES default network timeouts are quite short for
systems that must operate in heavily loaded network enviroments (VMs, EBS)
so first config setting is to increase the network tcp timeouts to mitigate
the risk of node disconnects.

-->>Yes make sense -- this property right? [discovery.zen.ping.timeout]

Maybe one chapter on the ES doc pages should describe the pros and cons
of replica setup and the consequences of data loss and dangling indexes
more clearly since many users seem to neglect the importance of creating a
fault tolerance replica-based setup.

---> Yes very much required!!

Thanks
Amit

On Friday, July 12, 2013 5:35:15 PM UTC+5:30, Amit Singh wrote:

Hi All,

I have a Elasticsearch cluster of 2 nodes in my staging environment.

both the nodes have following config;

cluster.name: Staging
node.name: "node1"
index.number_of_shards: 5
index.number_of_replicas: 0
discovery.zen.minimum_master_**nodes: 2
discovery.zen.ping.multicast.**enabled: false
discovery.zen.ping.unicast.**hosts: ["node1-ip", "node2-ip"]
cluster.routing.allocation.**node_initial_primaries_**recoveries: 8
network.bind_host: node1-ip
network.host: node1-ip
path.data: /mnt/common/es/data,/ebs1/**common/es/data
path.work: /mnt/common/es/work
path.logs: /mnt/common/es/logs

The cluster was running fine from months. Since the staging environment
is hosted on AWS, we had to occasionally restart the nodes in case of
network break. And some times we used to have the unassigned shards, which
we manually used to delete by api call.

Today all of a sudden I saw no data in the data directory.
I looked at the logs and it said dangling index scheduled to delete in 2
hrs and deleting dangling index. And it deleted all the dangling indices.

This is extremely scary. We lost all our data and if it happens on
Production we are literally dead. We take backup for production env using
rsync which also can get deleted during synch post dangling deletion.

Would appreciate you advice.
I understand there is no way to recover the data. But what I am
interested to know is why it happened.

Note: 1) There are no other cluster with same name. 2) No trace of
nodes going down.

Thanks
Amit

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

It happened again. Entire data folder cleaned up!!! Would appreciate your
thoughts.

However this time I don't see any dangling indices or shard recovery
failure message.
Last time I started the cluster with debug mode enabled so here are some
logs. Also some recreation of the story.

  1. There is a delete index request fired from our app for only one index
    (This is a genuine request)
  2. Subsequent to that the cluster started deleting all the indices.

Tracing the code could see in one place of
IndicesClusterStateService.clusterChanged ()
Here "applyDeletedIndices" method checks if the indices exist in the
cluster meta or not. When does not find it sends a delete index command.

Now not able to figure out why this scenario will arise? We did not face
any abnormality, none of the nodes went down.

Here is the piece of log info.
Logs:----

[2013-07-16 12:19:35,437][DEBUG][cluster.service ] [Staging2]
processing [zen-disco-receive(from master [[Staging
1][XHvZshrYRzScdfqO5RaWxA][inet[/10.190.209.134:9300]]])]: execute
[2013-07-16 12:19:35,437][DEBUG][cluster.service ] [Staging2]
cluster state updated, version [60], source [zen-disco-receive(from master
[[Staging 1][XHvZshrYRzScdfqO5RaWxA][inet[/10.190.209.134:9300]]])]
[2013-07-16 12:19:35,438][DEBUG][indices.cluster ] [Staging2]
[ia519c8e26e4b0c26f19968f53519c8e7ee4b0c26f19968f56] deleting index
[2013-07-16 12:19:35,438][DEBUG][indices ] [Staging2]
deleting Index [ia519c8e26e4b0c26f19968f53519c8e7ee4b0c26f19968f56]
[2013-07-16 12:19:35,438][DEBUG][index.service ] [Staging2]
[ia519c8e26e4b0c26f19968f53519c8e7ee4b0c26f19968f56] deleting shard_id
[2]
[2013-07-16 12:19:35,442][DEBUG][index.shard.service ] [Staging2]
[ia519c8e26e4b0c26f19968f53519c8e7ee4b0c26f19968f56][2] state:
[STARTED]->[CLOSED], reason [deleting index]
[2013-07-16 12:19:35,442][DEBUG][index.service ] [Staging2]
[ia519c8e26e4b0c26f19968f53519c8e7ee4b0c26f19968f56] deleting shard_id
[0]
[2013-07-16 12:19:35,443][DEBUG][index.shard.service ] [Staging2]
[ia519c8e26e4b0c26f19968f53519c8e7ee4b0c26f19968f56][0] state:
[STARTED]->[CLOSED], reason [deleting index]
[2013-07-16 12:19:35,444][DEBUG][index.service ] [Staging2]
[ia519c8e26e4b0c26f19968f53519c8e7ee4b0c26f19968f56] deleting shard_id
[4]
[2013-07-16 12:19:35,445][DEBUG][index.shard.service ] [Staging2]
[ia519c8e26e4b0c26f19968f53519c8e7ee4b0c26f19968f56][4] state:
[STARTED]->[CLOSED], reason [deleting index]
[2013-07-16 12:19:35,455][DEBUG][index.cache.filter.weighted] [Staging2]
[ia519c8e26e4b0c26f19968f53519c8e7ee4b0c26f19968f56] full cache clear,
reason [close]
[2013-07-16 12:19:35,455][DEBUG][index.cache.field.data.resident]
[Staging2] [ia519c8e26e4b0c26f19968f53519c8e7ee4b0c26f19968f56] full
cache clear, reason [close]
[2013-07-16 12:19:35,460][DEBUG][cluster.service ] [Staging2]
processing [zen-disco-receive(from master [[Staging
1][XHvZshrYRzScdfqO5RaWxA][inet[/10.190.209.134:9300]]])]: done applying
updated cluster_state
[2013-07-16 12:19:35,460][DEBUG][cluster.service ] [Staging2]
processing [zen-disco-receive(from master [[Staging
1][XHvZshrYRzScdfqO5RaWxA][inet[/10.190.209.134:9300]]])]: execute
[2013-07-16 12:19:35,460][DEBUG][cluster.service ] [Staging2]
cluster state updated, version [61], source [zen-disco-receive(from master
[[Staging 1][XHvZshrYRzScdfqO5RaWxA][inet[/10.190.209.134:9300]]])]
[2013-07-16 12:19:35,461][DEBUG][indices.cluster ] [Staging2]
[51b6bf7be4b0265937c8c6ba51e3f374e4b0f6e534eefb271] deleting index
[2013-07-16 12:19:35,461][DEBUG][indices ] [Staging2]
deleting Index [51b6bf7be4b0265937c8c6ba51e3f374e4b0f6e534eefb271]
[2013-07-16 12:19:35,462][DEBUG][index.service ] [Staging2]
[51b6bf7be4b0265937c8c6ba51e3f374e4b0f6e534eefb271] deleting shard_id
[3]
[2013-07-16 12:19:35,464][DEBUG][index.shard.service ] [Staging2]
[51b6bf7be4b0265937c8c6ba51e3f374e4b0f6e534eefb271][3] state:
[STARTED]->[CLOSED], reason [deleting index]
[2013-07-16 12:19:35,467][DEBUG][index.service ] [Staging2]
[51b6bf7be4b0265937c8c6ba51e3f374e4b0f6e534eefb271] deleting shard_id
[2]
[2013-07-16 12:19:35,468][DEBUG][index.shard.service ] [Staging2]
[51b6bf7be4b0265937c8c6ba51e3f374e4b0f6e534eefb271][2] state:
[STARTED]->[CLOSED], reason [deleting index]
[2013-07-16 12:19:35,479][DEBUG][index.cache.filter.weighted] [Staging2]
[51b6bf7be4b0265937c8c6ba51e3f374e4b0f6e534eefb271] full cache clear,
reason [close]
[2013-07-16 12:19:35,479][DEBUG][index.cache.field.data.resident]
[Staging2] [51b6bf7be4b0265937c8c6ba51e3f374e4b0f6e534eefb271] full
cache clear, reason [close]
[2013-07-16 12:19:35,483][DEBUG][cluster.service ] [Staging2]
processing [zen-disco-receive(from master [[Staging
1][XHvZshrYRzScdfqO5RaWxA][inet[/10.190.209.134:9300]]])]: done applying
updated cluster_state
[2013-07-16 12:19:35,483][DEBUG][cluster.service ] [Staging2]
processing [zen-disco-receive(from master [[Staging
1][XHvZshrYRzScdfqO5RaWxA][inet[/10.190.209.134:9300]]])]: execute
[2013-07-16 12:19:35,483][DEBUG][cluster.service ] [Staging2]
cluster state updated, version [62], source [zen-disco-receive(from master
[[Staging 1][XHvZshrYRzScdfqO5RaWxA][inet[/10.190.209.134:9300]]])]
[2013-07-16 12:19:35,484][DEBUG][indices.cluster ] [Staging2]
[517e4b01e4b0504899be6f0251e51429e4b03b128161e8ae1] deleting index
[2013-07-16 12:19:35,484][DEBUG][indices ] [Staging2]
deleting Index [517e4b01e4b0504899be6f0251e51429e4b03b128161e8ae1]
[2013-07-16 12:19:35,489][DEBUG][index.service ] [Staging2]
[517e4b01e4b0504899be6f0251e51429e4b03b128161e8ae1] deleting shard_id
[4]
[2013-07-16 12:19:35,491][DEBUG][index.shard.service ] [Staging2]
[517e4b01e4b0504899be6f0251e51429e4b03b128161e8ae1][4] state:
[STARTED]->[CLOSED], reason [deleting index]
[2013-07-16 12:19:35,491][DEBUG][index.service ] [Staging2]
[517e4b01e4b0504899be6f0251e51429e4b03b128161e8ae1] deleting shard_id
[2]
[2013-07-16 12:19:35,493][DEBUG][index.shard.service ] [Staging2]
[517e4b01e4b0504899be6f0251e51429e4b03b128161e8ae1][2] state:
[STARTED]->[CLOSED], reason [deleting index]
[2013-07-16 12:19:35,502][DEBUG][index.cache.filter.weighted] [Staging2]
[517e4b01e4b0504899be6f0251e51429e4b03b128161e8ae1] full cache clear,
reason [close]
[2013-07-16 12:19:35,502][DEBUG][index.cache.field.data.resident]
[Staging2] [517e4b01e4b0504899be6f0251e51429e4b03b128161e8ae1] full
cache clear, reason [close]
[2013-07-16 12:19:35,505][DEBUG][cluster.service ] [Staging2]
processing [zen-disco-receive(from master [[Staging
1][XHvZshrYRzScdfqO5RaWxA][inet[/10.190.209.134:9300]]])]: done applying
updated cluster_state
[2013-07-16 12:19:35,505][DEBUG][cluster.service ] [Staging2]
processing [zen-disco-receive(from master [[Staging
1][XHvZshrYRzScdfqO5RaWxA][inet[/10.190.209.134:9300]]])]: execute
[2013-07-16 12:19:35,505][DEBUG][cluster.service ] [Staging2]
cluster state updated, version [63], source [zen-disco-receive(from master
[[Staging 1][XHvZshrYRzScdfqO5RaWxA][inet[/10.190.209.134:9300]]])]
[2013-07-16 12:19:35,506][DEBUG][indices.cluster ] [Staging2]
[51d2faaae4b0bf8dd876b34351d3b5e9e4b08a3c2d4d27ac1] deleting index
[2013-07-16 12:19:35,506][DEBUG][indices ] [Staging2]
deleting Index [51d2faaae4b0bf8dd876b34351d3b5e9e4b08a3c2d4d27ac1]
[2013-07-16 12:19:35,507][DEBUG][index.service ] [Staging2]
[51d2faaae4b0bf8dd876b34351d3b5e9e4b08a3c2d4d27ac1] deleting shard_id
[2]
[2013-07-16 12:19:35,510][DEBUG][index.shard.service ] [Staging2]
[51d2faaae4b0bf8dd876b34351d3b5e9e4b08a3c2d4d27ac1][2] state:
[STARTED]->[CLOSED], reason [deleting index]
[2013-07-16 12:19:35,510][DEBUG][index.service ] [Staging2]
[51d2faaae4b0bf8dd876b34351d3b5e9e4b08a3c2d4d27ac1] deleting shard_id
[3]
[2013-07-16 12:19:35,512][DEBUG][index.shard.service ] [Staging2]
[51d2faaae4b0bf8dd876b34351d3b5e9e4b08a3c2d4d27ac1][3] state:
[STARTED]->[CLOSED], reason [deleting index]
[2013-07-16 12:19:35,516][DEBUG][index.cache.filter.weighted] [Staging2]
[51d2faaae4b0bf8dd876b34351d3b5e9e4b08a3c2d4d27ac1] full cache clear,
reason [close]
[2013-07-16 12:19:35,516][DEBUG][index.cache.field.data.resident]
[Staging2] [51d2faaae4b0bf8dd876b34351d3b5e9e4b08a3c2d4d27ac1] full
cache clear, reason [close]
[2013-07-16 12:19:35,522][DEBUG][cluster.service ] [Staging2]
processing [zen-disco-receive(from master [[Staging
1][XHvZshrYRzScdfqO5RaWxA][inet[/10.190.209.134:9300]]])]: done applying
updated cluster_state
[2013-07-16 12:19:35,553][DEBUG][cluster.service ] [Staging2]
processing [zen-disco-receive(from master [[Staging
1][XHvZshrYRzScdfqO5RaWxA][inet[/10.190.209.134:9300]]])]: execute
[2013-07-16 12:19:35,553][DEBUG][cluster.service ] [Staging2]
cluster state updated, version [64], source [zen-disco-receive(from master
[[Staging 1][XHvZshrYRzScdfqO5RaWxA][inet[/10.190.209.134:9300]]])]
[2013-07-16 12:19:35,554][DEBUG][indices.cluster ] [Staging2]
[519c8e26e4b0c26f19968f53519c8e7ee4b0c26f19968f561] deleting index
[2013-07-16 12:19:35,554][DEBUG][indices ] [Staging2]
deleting Index [519c8e26e4b0c26f19968f53519c8e7ee4b0c26f19968f561]
[2013-07-16 12:19:35,554][DEBUG][index.service ] [Staging2]
[519c8e26e4b0c26f19968f53519c8e7ee4b0c26f19968f561] deleting shard_id
[1]
[2013-07-16 12:19:35,554][DEBUG][index.service ] [Staging2]
[519c8e26e4b0c26f19968f53519c8e7ee4b0c26f19968f561] deleting shard_id
[4]
[2013-07-16 12:19:35,557][DEBUG][index.shard.service ] [Staging2]
[519c8e26e4b0c26f19968f53519c8e7ee4b0c26f19968f561][1] state:
[STARTED]->[CLOSED], reason [deleting index]
[2013-07-16 12:19:35,561][DEBUG][index.shard.service ] [Staging2]
[519c8e26e4b0c26f19968f53519c8e7ee4b0c26f19968f561][4] state:
[STARTED]->[CLOSED], reason [deleting index]
[2013-07-16 12:19:35,568][DEBUG][index.cache.filter.weighted] [Staging2]
[519c8e26e4b0c26f19968f53519c8e7ee4b0c26f19968f561] full cache clear,
reason [close]
[2013-07-16 12:19:35,568][DEBUG][index.cache.field.data.resident]
[Staging2] [519c8e26e4b0c26f19968f53519c8e7ee4b0c26f19968f561] full
cache clear, reason [close]
[2013-07-16 12:19:35,573][DEBUG][cluster.service ] [Staging2]
processing [zen-disco-receive(from master [[Staging
1][XHvZshrYRzScdfqO5RaWxA][inet[/10.190.209.134:9300]]])]: done applying
updated cluster_state
[2013-07-16 12:19:36,936][DEBUG][cluster.service ] [Staging2]
processing [zen-disco-receive(from master [[Staging
1][XHvZshrYRzScdfqO5RaWxA][inet[/10.190.209.134:9300]]])]: execute
[2013-07-16 12:19:36,936][DEBUG][cluster.service ] [Staging2]
cluster state updated, version [65], source [zen-disco-receive(from master
[[Staging 1][XHvZshrYRzScdfqO5RaWxA][inet[/10.190.209.134:9300]]])]
[2013-07-16 12:19:36,937][DEBUG][indices.cluster ] [Staging2]
[ia517e4b01e4b0504899be6f0251e51429e4b03b128161e8ae] deleting index
[2013-07-16 12:19:36,937][DEBUG][indices ] [Staging2]
deleting Index [ia517e4b01e4b0504899be6f0251e51429e4b03b128161e8ae]
[2013-07-16 12:19:36,937][DEBUG][index.service ] [Staging2]
[ia517e4b01e4b0504899be6f0251e51429e4b03b128161e8ae] deleting shard_id
[1]
[2013-07-16 12:19:36,937][DEBUG][index.service ] [Staging2]
[ia517e4b01e4b0504899be6f0251e51429e4b03b128161e8ae] deleting shard_id
[3]
[2013-07-16 12:19:36,937][DEBUG][index.service ] [Staging2]
[ia517e4b01e4b0504899be6f0251e51429e4b03b128161e8ae] deleting shard_id
[0]

746,129 47%

Thanks
Amit

On Sun, Jul 14, 2013 at 3:33 PM, Clinton Gormley clint@traveljury.comwrote:

Hi Amit

You did indeed fall foul of the bug in Lucene which deletes shards: this
message in the logs indicates that: "shard allocated for local recovery
(post api), should exists, but doesn't"

  • a 2-node cluster is never enough for creating an ES cluster. These
    mini-clusters are naturally exposed for split brains by just taking one
    node down and restarting it without having network connection to the
    running node. The setting minimum_master_nodes will not work in this case.

--->> After bringing up the second node (which went down) the cluster
state showed clearly that the second node has joined. So do you mean the
cluster state api can show false message and not always reliable?

If for some reason, your nodes stop being able to see each other, both
nodes will think that the other node has disappeared and each node will
form a new cluster. Then you will have a split brain. The
minimum_master_nodes setting says "if you don't see at least this many
master-eligible nodes, then don't try to form a cluster". So if you have 3
nodes, and minimum_master_nodes set to 2, then if one node disconnects, it
won't see enough master eligible nodes, and won't form a cluster by itself.
Instead it will keep looking for a cluster to join.

clint

On 13 July 2013 20:02, Amit Singh amitsingh.kec@gmail.com wrote:

My further addition,

responses in line (Blue text)--->

On Sat, Jul 13, 2013 at 1:37 PM, Jörg Prante joergprante@gmail.com
wrote:

Must agree with Ivan, and sorry for being impolite.

Amit, ES has replica level 1 active by default for very good reason. By
setting replica level 0, you were fully aware to opt out data recovery and
you were accepting data loss.

--->> Fully aware of this fact! I am fine if I loose a node and because
of which I loose data. But I am struggling to find why it would happen
otherwise. Another education I would need is, how a replica will help the
cause. Do you mean if I have a replica it will check against that before
sending a delete request or it will delete the replica as well? If it
checks the replica beforehand then make sense. Another related question is
if a shard/index gets corrupted (Yet to find actual reason why it gets
corrupted) what is the chance of the replica getting corrupted as well. I
experienced this earlier couple of times and hence I could not find much fo
importance of replicas apart from the fact that it can serve additional
search requests. Some sort of education will definitely help here.

ES itself provides only a simple, Lucene-based shard index checker for
data repair. But this is only an emergency tool for file system crash or
lost files and no guarantuee to recover anything.

Amit, you have the following issues:

  • a 2-node cluster is never enough for creating an ES cluster. These
    mini-clusters are naturally exposed for split brains by just taking one
    node down and restarting it without having network connection to the
    running node. The setting minimum_master_nodes will not work in this case.

--->> After bringing up the second node (which went down) the cluster
state showed clearly that the second node has joined. So do you mean the
cluster state api can show false message and not always reliable?

  • ES does not tell you if there is a split brain. There is a cluster
    state (red, yellow, green). You must strictly obey the cluster state in
    your clients while indexing and immediately stop indexing when cluster is
    not green, or if operations failed. Any data you push into a cluster by
    being not green is risky and the risk is they can't be replayed/recovered
    by an ES cluster later.

---> Great suggestion. Will implement this. But if the cluster states
are not reliable (based on the pervious explanation) then it will defeat
the purpose. There has to be something that we should be able to rely on.
If its the cluster state then I clearly remember the cluster state shows 2
nodes.

  • dangling indexes are the consequence of two masters alive and
    connected. One master will propose to delete the indexes while the other
    may not. This is random because there is no byzantine fault tolerance in ES
  • the masters will conflict in everything they do and they will not agree
    about the cluster state. There is no known method to recover a split master
    node to join a partial cluster again. And letting the masters continue to
    coexist is highly probable that you let them delete all the indexes that
    were dangling after a delay, at random.

---> Is there any api to find how many masters alive in a cluster?
Probably that needs to be considered as well. Can there be a situation
where the masters meta information gets corrupted or lost? Where does ES
keep the meta information. If so can this trigger delete request? Looking
at logs I could see that after one hour of bringing the 2nd node up it
starts showing shard recovery exception and then after few hours along with
shard recovery exception it starts delete index messages. Also start
showing dangling index messages. Seems like a multi organ failure scenario
:). If 2 master is alive won't their meta information be synched and will
it stil harm the old indices which are not getting indexed that point in
tim ?

Another point is that ES default network timeouts are quite short for
systems that must operate in heavily loaded network enviroments (VMs, EBS)
so first config setting is to increase the network tcp timeouts to mitigate
the risk of node disconnects.

-->>Yes make sense -- this property right? [discovery.zen.ping.timeout]

Maybe one chapter on the ES doc pages should describe the pros and cons
of replica setup and the consequences of data loss and dangling indexes
more clearly since many users seem to neglect the importance of creating a
fault tolerance replica-based setup.

---> Yes very much required!!

Thanks
Amit

On Friday, July 12, 2013 5:35:15 PM UTC+5:30, Amit Singh wrote:

Hi All,

I have a Elasticsearch cluster of 2 nodes in my staging environment.

both the nodes have following config;

cluster.name: Staging
node.name: "node1"
index.number_of_shards: 5
index.number_of_replicas: 0
discovery.zen.minimum_master_**nodes: 2
discovery.zen.ping.multicast.**enabled: false
discovery.zen.ping.unicast.**hosts: ["node1-ip", "node2-ip"]
cluster.routing.allocation.**node_initial_primaries_**recoveries: 8
network.bind_host: node1-ip
network.host: node1-ip
path.data: /mnt/common/es/data,/ebs1/**common/es/data
path.work: /mnt/common/es/work
path.logs: /mnt/common/es/logs

The cluster was running fine from months. Since the staging environment
is hosted on AWS, we had to occasionally restart the nodes in case of
network break. And some times we used to have the unassigned shards, which
we manually used to delete by api call.

Today all of a sudden I saw no data in the data directory.
I looked at the logs and it said dangling index scheduled to delete in 2
hrs and deleting dangling index. And it deleted all the dangling indices.

This is extremely scary. We lost all our data and if it happens on
Production we are literally dead. We take backup for production env using
rsync which also can get deleted during synch post dangling deletion.

Would appreciate you advice.
I understand there is no way to recover the data. But what I am
interested to know is why it happened.

Note: 1) There are no other cluster with same name. 2) No trace of
nodes going down.

Thanks
Amit

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Amit,

By any chance have you fired a delete request without any Index value? If
so it will delete all your indexes.
Have a look at this
linkhttp://www.elasticsearch.org/guide/reference/api/admin-indices-delete-index/.
(

)

The delete index API can also be applied to more than one index, or on
_all indices
(be careful!).* All indices will also be deleted when no specific index is
provided*. In order to disable allowing to delete all indices, set
action.disable_delete_all_indices setting in the config to true.

Cheers
Rahul

On Tue, Jul 16, 2013 at 8:26 PM, Amit Singh amitsingh.kec@gmail.com wrote:

It happened again. Entire data folder cleaned up!!! Would appreciate your
thoughts.

However this time I don't see any dangling indices or shard recovery
failure message.
Last time I started the cluster with debug mode enabled so here are some
logs. Also some recreation of the story.

  1. There is a delete index request fired from our app for only one index
    (This is a genuine request)
  2. Subsequent to that the cluster started deleting all the indices.

Tracing the code could see in one place of
IndicesClusterStateService.clusterChanged ()
Here "applyDeletedIndices" method checks if the indices exist in the
cluster meta or not. When does not find it sends a delete index command.

Now not able to figure out why this scenario will arise? We did not face
any abnormality, none of the nodes went down.

Here is the piece of log info.
Logs:----

[2013-07-16 12:19:35,437][DEBUG][cluster.service ] [Staging2]
processing [zen-disco-receive(from master [[Staging
1][XHvZshrYRzScdfqO5RaWxA][inet[/10.190.209.134:9300]]])]: execute
[2013-07-16 12:19:35,437][DEBUG][cluster.service ] [Staging2]
cluster state updated, version [60], source [zen-disco-receive(from master
[[Staging 1][XHvZshrYRzScdfqO5RaWxA][inet[/10.190.209.134:9300]]])]
[2013-07-16 12:19:35,438][DEBUG][indices.cluster ] [Staging2]
[ia519c8e26e4b0c26f19968f53519c8e7ee4b0c26f19968f56] deleting index
[2013-07-16 12:19:35,438][DEBUG][indices ] [Staging2]
deleting Index [ia519c8e26e4b0c26f19968f53519c8e7ee4b0c26f19968f56]
[2013-07-16 12:19:35,438][DEBUG][index.service ] [Staging2]
[ia519c8e26e4b0c26f19968f53519c8e7ee4b0c26f19968f56] deleting shard_id
[2]
[2013-07-16 12:19:35,442][DEBUG][index.shard.service ] [Staging2]
[ia519c8e26e4b0c26f19968f53519c8e7ee4b0c26f19968f56][2] state:
[STARTED]->[CLOSED], reason [deleting index]
[2013-07-16 12:19:35,442][DEBUG][index.service ] [Staging2]
[ia519c8e26e4b0c26f19968f53519c8e7ee4b0c26f19968f56] deleting shard_id
[0]
[2013-07-16 12:19:35,443][DEBUG][index.shard.service ] [Staging2]
[ia519c8e26e4b0c26f19968f53519c8e7ee4b0c26f19968f56][0] state:
[STARTED]->[CLOSED], reason [deleting index]
[2013-07-16 12:19:35,444][DEBUG][index.service ] [Staging2]
[ia519c8e26e4b0c26f19968f53519c8e7ee4b0c26f19968f56] deleting shard_id
[4]
[2013-07-16 12:19:35,445][DEBUG][index.shard.service ] [Staging2]
[ia519c8e26e4b0c26f19968f53519c8e7ee4b0c26f19968f56][4] state:
[STARTED]->[CLOSED], reason [deleting index]
[2013-07-16 12:19:35,455][DEBUG][index.cache.filter.weighted] [Staging2]
[ia519c8e26e4b0c26f19968f53519c8e7ee4b0c26f19968f56] full cache clear,
reason [close]
[2013-07-16 12:19:35,455][DEBUG][index.cache.field.data.resident]
[Staging2] [ia519c8e26e4b0c26f19968f53519c8e7ee4b0c26f19968f56] full
cache clear, reason [close]
[2013-07-16 12:19:35,460][DEBUG][cluster.service ] [Staging2]
processing [zen-disco-receive(from master [[Staging
1][XHvZshrYRzScdfqO5RaWxA][inet[/10.190.209.134:9300]]])]: done applying
updated cluster_state
[2013-07-16 12:19:35,460][DEBUG][cluster.service ] [Staging2]
processing [zen-disco-receive(from master [[Staging
1][XHvZshrYRzScdfqO5RaWxA][inet[/10.190.209.134:9300]]])]: execute
[2013-07-16 12:19:35,460][DEBUG][cluster.service ] [Staging2]
cluster state updated, version [61], source [zen-disco-receive(from master
[[Staging 1][XHvZshrYRzScdfqO5RaWxA][inet[/10.190.209.134:9300]]])]
[2013-07-16 12:19:35,461][DEBUG][indices.cluster ] [Staging2]
[51b6bf7be4b0265937c8c6ba51e3f374e4b0f6e534eefb271] deleting index
[2013-07-16 12:19:35,461][DEBUG][indices ] [Staging2]
deleting Index [51b6bf7be4b0265937c8c6ba51e3f374e4b0f6e534eefb271]
[2013-07-16 12:19:35,462][DEBUG][index.service ] [Staging2]
[51b6bf7be4b0265937c8c6ba51e3f374e4b0f6e534eefb271] deleting shard_id
[3]
[2013-07-16 12:19:35,464][DEBUG][index.shard.service ] [Staging2]
[51b6bf7be4b0265937c8c6ba51e3f374e4b0f6e534eefb271][3] state:
[STARTED]->[CLOSED], reason [deleting index]
[2013-07-16 12:19:35,467][DEBUG][index.service ] [Staging2]
[51b6bf7be4b0265937c8c6ba51e3f374e4b0f6e534eefb271] deleting shard_id
[2]
[2013-07-16 12:19:35,468][DEBUG][index.shard.service ] [Staging2]
[51b6bf7be4b0265937c8c6ba51e3f374e4b0f6e534eefb271][2] state:
[STARTED]->[CLOSED], reason [deleting index]
[2013-07-16 12:19:35,479][DEBUG][index.cache.filter.weighted] [Staging2]
[51b6bf7be4b0265937c8c6ba51e3f374e4b0f6e534eefb271] full cache clear,
reason [close]
[2013-07-16 12:19:35,479][DEBUG][index.cache.field.data.resident]
[Staging2] [51b6bf7be4b0265937c8c6ba51e3f374e4b0f6e534eefb271] full
cache clear, reason [close]
[2013-07-16 12:19:35,483][DEBUG][cluster.service ] [Staging2]
processing [zen-disco-receive(from master [[Staging
1][XHvZshrYRzScdfqO5RaWxA][inet[/10.190.209.134:9300]]])]: done applying
updated cluster_state
[2013-07-16 12:19:35,483][DEBUG][cluster.service ] [Staging2]
processing [zen-disco-receive(from master [[Staging
1][XHvZshrYRzScdfqO5RaWxA][inet[/10.190.209.134:9300]]])]: execute
[2013-07-16 12:19:35,483][DEBUG][cluster.service ] [Staging2]
cluster state updated, version [62], source [zen-disco-receive(from master
[[Staging 1][XHvZshrYRzScdfqO5RaWxA][inet[/10.190.209.134:9300]]])]
[2013-07-16 12:19:35,484][DEBUG][indices.cluster ] [Staging2]
[517e4b01e4b0504899be6f0251e51429e4b03b128161e8ae1] deleting index
[2013-07-16 12:19:35,484][DEBUG][indices ] [Staging2]
deleting Index [517e4b01e4b0504899be6f0251e51429e4b03b128161e8ae1]
[2013-07-16 12:19:35,489][DEBUG][index.service ] [Staging2]
[517e4b01e4b0504899be6f0251e51429e4b03b128161e8ae1] deleting shard_id
[4]
[2013-07-16 12:19:35,491][DEBUG][index.shard.service ] [Staging2]
[517e4b01e4b0504899be6f0251e51429e4b03b128161e8ae1][4] state:
[STARTED]->[CLOSED], reason [deleting index]
[2013-07-16 12:19:35,491][DEBUG][index.service ] [Staging2]
[517e4b01e4b0504899be6f0251e51429e4b03b128161e8ae1] deleting shard_id
[2]
[2013-07-16 12:19:35,493][DEBUG][index.shard.service ] [Staging2]
[517e4b01e4b0504899be6f0251e51429e4b03b128161e8ae1][2] state:
[STARTED]->[CLOSED], reason [deleting index]
[2013-07-16 12:19:35,502][DEBUG][index.cache.filter.weighted] [Staging2]
[517e4b01e4b0504899be6f0251e51429e4b03b128161e8ae1] full cache clear,
reason [close]
[2013-07-16 12:19:35,502][DEBUG][index.cache.field.data.resident]
[Staging2] [517e4b01e4b0504899be6f0251e51429e4b03b128161e8ae1] full
cache clear, reason [close]
[2013-07-16 12:19:35,505][DEBUG][cluster.service ] [Staging2]
processing [zen-disco-receive(from master [[Staging
1][XHvZshrYRzScdfqO5RaWxA][inet[/10.190.209.134:9300]]])]: done applying
updated cluster_state
[2013-07-16 12:19:35,505][DEBUG][cluster.service ] [Staging2]
processing [zen-disco-receive(from master [[Staging
1][XHvZshrYRzScdfqO5RaWxA][inet[/10.190.209.134:9300]]])]: execute
[2013-07-16 12:19:35,505][DEBUG][cluster.service ] [Staging2]
cluster state updated, version [63], source [zen-disco-receive(from master
[[Staging 1][XHvZshrYRzScdfqO5RaWxA][inet[/10.190.209.134:9300]]])]
[2013-07-16 12:19:35,506][DEBUG][indices.cluster ] [Staging2]
[51d2faaae4b0bf8dd876b34351d3b5e9e4b08a3c2d4d27ac1] deleting index
[2013-07-16 12:19:35,506][DEBUG][indices ] [Staging2]
deleting Index [51d2faaae4b0bf8dd876b34351d3b5e9e4b08a3c2d4d27ac1]
[2013-07-16 12:19:35,507][DEBUG][index.service ] [Staging2]
[51d2faaae4b0bf8dd876b34351d3b5e9e4b08a3c2d4d27ac1] deleting shard_id
[2]
[2013-07-16 12:19:35,510][DEBUG][index.shard.service ] [Staging2]
[51d2faaae4b0bf8dd876b34351d3b5e9e4b08a3c2d4d27ac1][2] state:
[STARTED]->[CLOSED], reason [deleting index]
[2013-07-16 12:19:35,510][DEBUG][index.service ] [Staging2]
[51d2faaae4b0bf8dd876b34351d3b5e9e4b08a3c2d4d27ac1] deleting shard_id
[3]
[2013-07-16 12:19:35,512][DEBUG][index.shard.service ] [Staging2]
[51d2faaae4b0bf8dd876b34351d3b5e9e4b08a3c2d4d27ac1][3] state:
[STARTED]->[CLOSED], reason [deleting index]
[2013-07-16 12:19:35,516][DEBUG][index.cache.filter.weighted] [Staging2]
[51d2faaae4b0bf8dd876b34351d3b5e9e4b08a3c2d4d27ac1] full cache clear,
reason [close]
[2013-07-16 12:19:35,516][DEBUG][index.cache.field.data.resident]
[Staging2] [51d2faaae4b0bf8dd876b34351d3b5e9e4b08a3c2d4d27ac1] full
cache clear, reason [close]
[2013-07-16 12:19:35,522][DEBUG][cluster.service ] [Staging2]
processing [zen-disco-receive(from master [[Staging
1][XHvZshrYRzScdfqO5RaWxA][inet[/10.190.209.134:9300]]])]: done applying
updated cluster_state
[2013-07-16 12:19:35,553][DEBUG][cluster.service ] [Staging2]
processing [zen-disco-receive(from master [[Staging
1][XHvZshrYRzScdfqO5RaWxA][inet[/10.190.209.134:9300]]])]: execute
[2013-07-16 12:19:35,553][DEBUG][cluster.service ] [Staging2]
cluster state updated, version [64], source [zen-disco-receive(from master
[[Staging 1][XHvZshrYRzScdfqO5RaWxA][inet[/10.190.209.134:9300]]])]
[2013-07-16 12:19:35,554][DEBUG][indices.cluster ] [Staging2]
[519c8e26e4b0c26f19968f53519c8e7ee4b0c26f19968f561] deleting index
[2013-07-16 12:19:35,554][DEBUG][indices ] [Staging2]
deleting Index [519c8e26e4b0c26f19968f53519c8e7ee4b0c26f19968f561]
[2013-07-16 12:19:35,554][DEBUG][index.service ] [Staging2]
[519c8e26e4b0c26f19968f53519c8e7ee4b0c26f19968f561] deleting shard_id
[1]
[2013-07-16 12:19:35,554][DEBUG][index.service ] [Staging2]
[519c8e26e4b0c26f19968f53519c8e7ee4b0c26f19968f561] deleting shard_id
[4]
[2013-07-16 12:19:35,557][DEBUG][index.shard.service ] [Staging2]
[519c8e26e4b0c26f19968f53519c8e7ee4b0c26f19968f561][1] state:
[STARTED]->[CLOSED], reason [deleting index]
[2013-07-16 12:19:35,561][DEBUG][index.shard.service ] [Staging2]
[519c8e26e4b0c26f19968f53519c8e7ee4b0c26f19968f561][4] state:
[STARTED]->[CLOSED], reason [deleting index]
[2013-07-16 12:19:35,568][DEBUG][index.cache.filter.weighted] [Staging2]
[519c8e26e4b0c26f19968f53519c8e7ee4b0c26f19968f561] full cache clear,
reason [close]
[2013-07-16 12:19:35,568][DEBUG][index.cache.field.data.resident]
[Staging2] [519c8e26e4b0c26f19968f53519c8e7ee4b0c26f19968f561] full
cache clear, reason [close]
[2013-07-16 12:19:35,573][DEBUG][cluster.service ] [Staging2]
processing [zen-disco-receive(from master [[Staging
1][XHvZshrYRzScdfqO5RaWxA][inet[/10.190.209.134:9300]]])]: done applying
updated cluster_state
[2013-07-16 12:19:36,936][DEBUG][cluster.service ] [Staging2]
processing [zen-disco-receive(from master [[Staging
1][XHvZshrYRzScdfqO5RaWxA][inet[/10.190.209.134:9300]]])]: execute
[2013-07-16 12:19:36,936][DEBUG][cluster.service ] [Staging2]
cluster state updated, version [65], source [zen-disco-receive(from master
[[Staging 1][XHvZshrYRzScdfqO5RaWxA][inet[/10.190.209.134:9300]]])]
[2013-07-16 12:19:36,937][DEBUG][indices.cluster ] [Staging2]
[ia517e4b01e4b0504899be6f0251e51429e4b03b128161e8ae] deleting index
[2013-07-16 12:19:36,937][DEBUG][indices ] [Staging2]
deleting Index [ia517e4b01e4b0504899be6f0251e51429e4b03b128161e8ae]
[2013-07-16 12:19:36,937][DEBUG][index.service ] [Staging2]
[ia517e4b01e4b0504899be6f0251e51429e4b03b128161e8ae] deleting shard_id
[1]
[2013-07-16 12:19:36,937][DEBUG][index.service ] [Staging2]
[ia517e4b01e4b0504899be6f0251e51429e4b03b128161e8ae] deleting shard_id
[3]
[2013-07-16 12:19:36,937][DEBUG][index.service ] [Staging2]
[ia517e4b01e4b0504899be6f0251e51429e4b03b128161e8ae] deleting shard_id
[0]

746,129 47%

Thanks
Amit

On Sun, Jul 14, 2013 at 3:33 PM, Clinton Gormley clint@traveljury.comwrote:

Hi Amit

You did indeed fall foul of the bug in Lucene which deletes shards: this
message in the logs indicates that: "shard allocated for local recovery
(post api), should exists, but doesn't"

  • a 2-node cluster is never enough for creating an ES cluster. These
    mini-clusters are naturally exposed for split brains by just taking one
    node down and restarting it without having network connection to the
    running node. The setting minimum_master_nodes will not work in this case.

--->> After bringing up the second node (which went down) the cluster
state showed clearly that the second node has joined. So do you mean the
cluster state api can show false message and not always reliable?

If for some reason, your nodes stop being able to see each other, both
nodes will think that the other node has disappeared and each node will
form a new cluster. Then you will have a split brain. The
minimum_master_nodes setting says "if you don't see at least this many
master-eligible nodes, then don't try to form a cluster". So if you have 3
nodes, and minimum_master_nodes set to 2, then if one node disconnects, it
won't see enough master eligible nodes, and won't form a cluster by itself.
Instead it will keep looking for a cluster to join.

clint

On 13 July 2013 20:02, Amit Singh amitsingh.kec@gmail.com wrote:

My further addition,

responses in line (Blue text)--->

On Sat, Jul 13, 2013 at 1:37 PM, Jörg Prante joergprante@gmail.com
wrote:

Must agree with Ivan, and sorry for being impolite.

Amit, ES has replica level 1 active by default for very good reason. By
setting replica level 0, you were fully aware to opt out data recovery and
you were accepting data loss.

--->> Fully aware of this fact! I am fine if I loose a node and because
of which I loose data. But I am struggling to find why it would happen
otherwise. Another education I would need is, how a replica will help the
cause. Do you mean if I have a replica it will check against that before
sending a delete request or it will delete the replica as well? If it
checks the replica beforehand then make sense. Another related question is
if a shard/index gets corrupted (Yet to find actual reason why it gets
corrupted) what is the chance of the replica getting corrupted as well. I
experienced this earlier couple of times and hence I could not find much fo
importance of replicas apart from the fact that it can serve additional
search requests. Some sort of education will definitely help here.

ES itself provides only a simple, Lucene-based shard index checker for
data repair. But this is only an emergency tool for file system crash or
lost files and no guarantuee to recover anything.

Amit, you have the following issues:

  • a 2-node cluster is never enough for creating an ES cluster. These
    mini-clusters are naturally exposed for split brains by just taking one
    node down and restarting it without having network connection to the
    running node. The setting minimum_master_nodes will not work in this case.

--->> After bringing up the second node (which went down) the cluster
state showed clearly that the second node has joined. So do you mean the
cluster state api can show false message and not always reliable?

  • ES does not tell you if there is a split brain. There is a cluster
    state (red, yellow, green). You must strictly obey the cluster state in
    your clients while indexing and immediately stop indexing when cluster is
    not green, or if operations failed. Any data you push into a cluster by
    being not green is risky and the risk is they can't be replayed/recovered
    by an ES cluster later.

---> Great suggestion. Will implement this. But if the cluster states
are not reliable (based on the pervious explanation) then it will defeat
the purpose. There has to be something that we should be able to rely on.
If its the cluster state then I clearly remember the cluster state shows 2
nodes.

  • dangling indexes are the consequence of two masters alive and
    connected. One master will propose to delete the indexes while the other
    may not. This is random because there is no byzantine fault tolerance in ES
  • the masters will conflict in everything they do and they will not agree
    about the cluster state. There is no known method to recover a split master
    node to join a partial cluster again. And letting the masters continue to
    coexist is highly probable that you let them delete all the indexes that
    were dangling after a delay, at random.

---> Is there any api to find how many masters alive in a cluster?
Probably that needs to be considered as well. Can there be a situation
where the masters meta information gets corrupted or lost? Where does ES
keep the meta information. If so can this trigger delete request? Looking
at logs I could see that after one hour of bringing the 2nd node up it
starts showing shard recovery exception and then after few hours along with
shard recovery exception it starts delete index messages. Also start
showing dangling index messages. Seems like a multi organ failure scenario
:). If 2 master is alive won't their meta information be synched and will
it stil harm the old indices which are not getting indexed that point in
tim ?

Another point is that ES default network timeouts are quite short for
systems that must operate in heavily loaded network enviroments (VMs, EBS)
so first config setting is to increase the network tcp timeouts to mitigate
the risk of node disconnects.

-->>Yes make sense -- this property right? [discovery.zen.ping.timeout]

Maybe one chapter on the ES doc pages should describe the pros and cons
of replica setup and the consequences of data loss and dangling indexes
more clearly since many users seem to neglect the importance of creating a
fault tolerance replica-based setup.

---> Yes very much required!!

Thanks
Amit

On Friday, July 12, 2013 5:35:15 PM UTC+5:30, Amit Singh wrote:

Hi All,

I have a Elasticsearch cluster of 2 nodes in my staging environment.

both the nodes have following config;

cluster.name: Staging
node.name: "node1"
index.number_of_shards: 5
index.number_of_replicas: 0
discovery.zen.minimum_master_**nodes: 2
discovery.zen.ping.multicast.**enabled: false
discovery.zen.ping.unicast.**hosts: ["node1-ip", "node2-ip"]
cluster.routing.allocation.**node_initial_primaries_**recoveries: 8
network.bind_host: node1-ip
network.host: node1-ip
path.data: /mnt/common/es/data,/ebs1/**common/es/data
path.work: /mnt/common/es/work
path.logs: /mnt/common/es/logs

The cluster was running fine from months. Since the staging environment
is hosted on AWS, we had to occasionally restart the nodes in case of
network break. And some times we used to have the unassigned shards, which
we manually used to delete by api call.

Today all of a sudden I saw no data in the data directory.
I looked at the logs and it said dangling index scheduled to delete in
2 hrs and deleting dangling index. And it deleted all the dangling indices.

This is extremely scary. We lost all our data and if it happens on
Production we are literally dead. We take backup for production env using
rsync which also can get deleted during synch post dangling deletion.

Would appreciate you advice.
I understand there is no way to recover the data. But what I am
interested to know is why it happened.

Note: 1) There are no other cluster with same name. 2) No trace of
nodes going down.

Thanks
Amit

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.