It happened again. Entire data folder cleaned up!!! Would appreciate your
thoughts.
However this time I don't see any dangling indices or shard recovery
failure message.
Last time I started the cluster with debug mode enabled so here are some
logs. Also some recreation of the story.
- There is a delete index request fired from our app for only one index
(This is a genuine request)
- Subsequent to that the cluster started deleting all the indices.
Tracing the code could see in one place of
IndicesClusterStateService.clusterChanged ()
Here "applyDeletedIndices" method checks if the indices exist in the
cluster meta or not. When does not find it sends a delete index command.
Now not able to figure out why this scenario will arise? We did not face
any abnormality, none of the nodes went down.
Here is the piece of log info.
Logs:----
[2013-07-16 12:19:35,437][DEBUG][cluster.service ] [Staging2]
processing [zen-disco-receive(from master [[Staging
1][XHvZshrYRzScdfqO5RaWxA][inet[/10.190.209.134:9300]]])]: execute
[2013-07-16 12:19:35,437][DEBUG][cluster.service ] [Staging2]
cluster state updated, version [60], source [zen-disco-receive(from master
[[Staging 1][XHvZshrYRzScdfqO5RaWxA][inet[/10.190.209.134:9300]]])]
[2013-07-16 12:19:35,438][DEBUG][indices.cluster ] [Staging2]
[ia519c8e26e4b0c26f19968f53519c8e7ee4b0c26f19968f56] deleting index
[2013-07-16 12:19:35,438][DEBUG][indices ] [Staging2]
deleting Index [ia519c8e26e4b0c26f19968f53519c8e7ee4b0c26f19968f56]
[2013-07-16 12:19:35,438][DEBUG][index.service ] [Staging2]
[ia519c8e26e4b0c26f19968f53519c8e7ee4b0c26f19968f56] deleting shard_id
[2]
[2013-07-16 12:19:35,442][DEBUG][index.shard.service ] [Staging2]
[ia519c8e26e4b0c26f19968f53519c8e7ee4b0c26f19968f56][2] state:
[STARTED]->[CLOSED], reason [deleting index]
[2013-07-16 12:19:35,442][DEBUG][index.service ] [Staging2]
[ia519c8e26e4b0c26f19968f53519c8e7ee4b0c26f19968f56] deleting shard_id
[0]
[2013-07-16 12:19:35,443][DEBUG][index.shard.service ] [Staging2]
[ia519c8e26e4b0c26f19968f53519c8e7ee4b0c26f19968f56][0] state:
[STARTED]->[CLOSED], reason [deleting index]
[2013-07-16 12:19:35,444][DEBUG][index.service ] [Staging2]
[ia519c8e26e4b0c26f19968f53519c8e7ee4b0c26f19968f56] deleting shard_id
[4]
[2013-07-16 12:19:35,445][DEBUG][index.shard.service ] [Staging2]
[ia519c8e26e4b0c26f19968f53519c8e7ee4b0c26f19968f56][4] state:
[STARTED]->[CLOSED], reason [deleting index]
[2013-07-16 12:19:35,455][DEBUG][index.cache.filter.weighted] [Staging2]
[ia519c8e26e4b0c26f19968f53519c8e7ee4b0c26f19968f56] full cache clear,
reason [close]
[2013-07-16 12:19:35,455][DEBUG][index.cache.field.data.resident]
[Staging2] [ia519c8e26e4b0c26f19968f53519c8e7ee4b0c26f19968f56] full
cache clear, reason [close]
[2013-07-16 12:19:35,460][DEBUG][cluster.service ] [Staging2]
processing [zen-disco-receive(from master [[Staging
1][XHvZshrYRzScdfqO5RaWxA][inet[/10.190.209.134:9300]]])]: done applying
updated cluster_state
[2013-07-16 12:19:35,460][DEBUG][cluster.service ] [Staging2]
processing [zen-disco-receive(from master [[Staging
1][XHvZshrYRzScdfqO5RaWxA][inet[/10.190.209.134:9300]]])]: execute
[2013-07-16 12:19:35,460][DEBUG][cluster.service ] [Staging2]
cluster state updated, version [61], source [zen-disco-receive(from master
[[Staging 1][XHvZshrYRzScdfqO5RaWxA][inet[/10.190.209.134:9300]]])]
[2013-07-16 12:19:35,461][DEBUG][indices.cluster ] [Staging2]
[51b6bf7be4b0265937c8c6ba51e3f374e4b0f6e534eefb271] deleting index
[2013-07-16 12:19:35,461][DEBUG][indices ] [Staging2]
deleting Index [51b6bf7be4b0265937c8c6ba51e3f374e4b0f6e534eefb271]
[2013-07-16 12:19:35,462][DEBUG][index.service ] [Staging2]
[51b6bf7be4b0265937c8c6ba51e3f374e4b0f6e534eefb271] deleting shard_id
[3]
[2013-07-16 12:19:35,464][DEBUG][index.shard.service ] [Staging2]
[51b6bf7be4b0265937c8c6ba51e3f374e4b0f6e534eefb271][3] state:
[STARTED]->[CLOSED], reason [deleting index]
[2013-07-16 12:19:35,467][DEBUG][index.service ] [Staging2]
[51b6bf7be4b0265937c8c6ba51e3f374e4b0f6e534eefb271] deleting shard_id
[2]
[2013-07-16 12:19:35,468][DEBUG][index.shard.service ] [Staging2]
[51b6bf7be4b0265937c8c6ba51e3f374e4b0f6e534eefb271][2] state:
[STARTED]->[CLOSED], reason [deleting index]
[2013-07-16 12:19:35,479][DEBUG][index.cache.filter.weighted] [Staging2]
[51b6bf7be4b0265937c8c6ba51e3f374e4b0f6e534eefb271] full cache clear,
reason [close]
[2013-07-16 12:19:35,479][DEBUG][index.cache.field.data.resident]
[Staging2] [51b6bf7be4b0265937c8c6ba51e3f374e4b0f6e534eefb271] full
cache clear, reason [close]
[2013-07-16 12:19:35,483][DEBUG][cluster.service ] [Staging2]
processing [zen-disco-receive(from master [[Staging
1][XHvZshrYRzScdfqO5RaWxA][inet[/10.190.209.134:9300]]])]: done applying
updated cluster_state
[2013-07-16 12:19:35,483][DEBUG][cluster.service ] [Staging2]
processing [zen-disco-receive(from master [[Staging
1][XHvZshrYRzScdfqO5RaWxA][inet[/10.190.209.134:9300]]])]: execute
[2013-07-16 12:19:35,483][DEBUG][cluster.service ] [Staging2]
cluster state updated, version [62], source [zen-disco-receive(from master
[[Staging 1][XHvZshrYRzScdfqO5RaWxA][inet[/10.190.209.134:9300]]])]
[2013-07-16 12:19:35,484][DEBUG][indices.cluster ] [Staging2]
[517e4b01e4b0504899be6f0251e51429e4b03b128161e8ae1] deleting index
[2013-07-16 12:19:35,484][DEBUG][indices ] [Staging2]
deleting Index [517e4b01e4b0504899be6f0251e51429e4b03b128161e8ae1]
[2013-07-16 12:19:35,489][DEBUG][index.service ] [Staging2]
[517e4b01e4b0504899be6f0251e51429e4b03b128161e8ae1] deleting shard_id
[4]
[2013-07-16 12:19:35,491][DEBUG][index.shard.service ] [Staging2]
[517e4b01e4b0504899be6f0251e51429e4b03b128161e8ae1][4] state:
[STARTED]->[CLOSED], reason [deleting index]
[2013-07-16 12:19:35,491][DEBUG][index.service ] [Staging2]
[517e4b01e4b0504899be6f0251e51429e4b03b128161e8ae1] deleting shard_id
[2]
[2013-07-16 12:19:35,493][DEBUG][index.shard.service ] [Staging2]
[517e4b01e4b0504899be6f0251e51429e4b03b128161e8ae1][2] state:
[STARTED]->[CLOSED], reason [deleting index]
[2013-07-16 12:19:35,502][DEBUG][index.cache.filter.weighted] [Staging2]
[517e4b01e4b0504899be6f0251e51429e4b03b128161e8ae1] full cache clear,
reason [close]
[2013-07-16 12:19:35,502][DEBUG][index.cache.field.data.resident]
[Staging2] [517e4b01e4b0504899be6f0251e51429e4b03b128161e8ae1] full
cache clear, reason [close]
[2013-07-16 12:19:35,505][DEBUG][cluster.service ] [Staging2]
processing [zen-disco-receive(from master [[Staging
1][XHvZshrYRzScdfqO5RaWxA][inet[/10.190.209.134:9300]]])]: done applying
updated cluster_state
[2013-07-16 12:19:35,505][DEBUG][cluster.service ] [Staging2]
processing [zen-disco-receive(from master [[Staging
1][XHvZshrYRzScdfqO5RaWxA][inet[/10.190.209.134:9300]]])]: execute
[2013-07-16 12:19:35,505][DEBUG][cluster.service ] [Staging2]
cluster state updated, version [63], source [zen-disco-receive(from master
[[Staging 1][XHvZshrYRzScdfqO5RaWxA][inet[/10.190.209.134:9300]]])]
[2013-07-16 12:19:35,506][DEBUG][indices.cluster ] [Staging2]
[51d2faaae4b0bf8dd876b34351d3b5e9e4b08a3c2d4d27ac1] deleting index
[2013-07-16 12:19:35,506][DEBUG][indices ] [Staging2]
deleting Index [51d2faaae4b0bf8dd876b34351d3b5e9e4b08a3c2d4d27ac1]
[2013-07-16 12:19:35,507][DEBUG][index.service ] [Staging2]
[51d2faaae4b0bf8dd876b34351d3b5e9e4b08a3c2d4d27ac1] deleting shard_id
[2]
[2013-07-16 12:19:35,510][DEBUG][index.shard.service ] [Staging2]
[51d2faaae4b0bf8dd876b34351d3b5e9e4b08a3c2d4d27ac1][2] state:
[STARTED]->[CLOSED], reason [deleting index]
[2013-07-16 12:19:35,510][DEBUG][index.service ] [Staging2]
[51d2faaae4b0bf8dd876b34351d3b5e9e4b08a3c2d4d27ac1] deleting shard_id
[3]
[2013-07-16 12:19:35,512][DEBUG][index.shard.service ] [Staging2]
[51d2faaae4b0bf8dd876b34351d3b5e9e4b08a3c2d4d27ac1][3] state:
[STARTED]->[CLOSED], reason [deleting index]
[2013-07-16 12:19:35,516][DEBUG][index.cache.filter.weighted] [Staging2]
[51d2faaae4b0bf8dd876b34351d3b5e9e4b08a3c2d4d27ac1] full cache clear,
reason [close]
[2013-07-16 12:19:35,516][DEBUG][index.cache.field.data.resident]
[Staging2] [51d2faaae4b0bf8dd876b34351d3b5e9e4b08a3c2d4d27ac1] full
cache clear, reason [close]
[2013-07-16 12:19:35,522][DEBUG][cluster.service ] [Staging2]
processing [zen-disco-receive(from master [[Staging
1][XHvZshrYRzScdfqO5RaWxA][inet[/10.190.209.134:9300]]])]: done applying
updated cluster_state
[2013-07-16 12:19:35,553][DEBUG][cluster.service ] [Staging2]
processing [zen-disco-receive(from master [[Staging
1][XHvZshrYRzScdfqO5RaWxA][inet[/10.190.209.134:9300]]])]: execute
[2013-07-16 12:19:35,553][DEBUG][cluster.service ] [Staging2]
cluster state updated, version [64], source [zen-disco-receive(from master
[[Staging 1][XHvZshrYRzScdfqO5RaWxA][inet[/10.190.209.134:9300]]])]
[2013-07-16 12:19:35,554][DEBUG][indices.cluster ] [Staging2]
[519c8e26e4b0c26f19968f53519c8e7ee4b0c26f19968f561] deleting index
[2013-07-16 12:19:35,554][DEBUG][indices ] [Staging2]
deleting Index [519c8e26e4b0c26f19968f53519c8e7ee4b0c26f19968f561]
[2013-07-16 12:19:35,554][DEBUG][index.service ] [Staging2]
[519c8e26e4b0c26f19968f53519c8e7ee4b0c26f19968f561] deleting shard_id
[1]
[2013-07-16 12:19:35,554][DEBUG][index.service ] [Staging2]
[519c8e26e4b0c26f19968f53519c8e7ee4b0c26f19968f561] deleting shard_id
[4]
[2013-07-16 12:19:35,557][DEBUG][index.shard.service ] [Staging2]
[519c8e26e4b0c26f19968f53519c8e7ee4b0c26f19968f561][1] state:
[STARTED]->[CLOSED], reason [deleting index]
[2013-07-16 12:19:35,561][DEBUG][index.shard.service ] [Staging2]
[519c8e26e4b0c26f19968f53519c8e7ee4b0c26f19968f561][4] state:
[STARTED]->[CLOSED], reason [deleting index]
[2013-07-16 12:19:35,568][DEBUG][index.cache.filter.weighted] [Staging2]
[519c8e26e4b0c26f19968f53519c8e7ee4b0c26f19968f561] full cache clear,
reason [close]
[2013-07-16 12:19:35,568][DEBUG][index.cache.field.data.resident]
[Staging2] [519c8e26e4b0c26f19968f53519c8e7ee4b0c26f19968f561] full
cache clear, reason [close]
[2013-07-16 12:19:35,573][DEBUG][cluster.service ] [Staging2]
processing [zen-disco-receive(from master [[Staging
1][XHvZshrYRzScdfqO5RaWxA][inet[/10.190.209.134:9300]]])]: done applying
updated cluster_state
[2013-07-16 12:19:36,936][DEBUG][cluster.service ] [Staging2]
processing [zen-disco-receive(from master [[Staging
1][XHvZshrYRzScdfqO5RaWxA][inet[/10.190.209.134:9300]]])]: execute
[2013-07-16 12:19:36,936][DEBUG][cluster.service ] [Staging2]
cluster state updated, version [65], source [zen-disco-receive(from master
[[Staging 1][XHvZshrYRzScdfqO5RaWxA][inet[/10.190.209.134:9300]]])]
[2013-07-16 12:19:36,937][DEBUG][indices.cluster ] [Staging2]
[ia517e4b01e4b0504899be6f0251e51429e4b03b128161e8ae] deleting index
[2013-07-16 12:19:36,937][DEBUG][indices ] [Staging2]
deleting Index [ia517e4b01e4b0504899be6f0251e51429e4b03b128161e8ae]
[2013-07-16 12:19:36,937][DEBUG][index.service ] [Staging2]
[ia517e4b01e4b0504899be6f0251e51429e4b03b128161e8ae] deleting shard_id
[1]
[2013-07-16 12:19:36,937][DEBUG][index.service ] [Staging2]
[ia517e4b01e4b0504899be6f0251e51429e4b03b128161e8ae] deleting shard_id
[3]
[2013-07-16 12:19:36,937][DEBUG][index.service ] [Staging2]
[ia517e4b01e4b0504899be6f0251e51429e4b03b128161e8ae] deleting shard_id
[0]
746,129 47%
Thanks
Amit
On Sun, Jul 14, 2013 at 3:33 PM, Clinton Gormley clint@traveljury.comwrote:
Hi Amit
You did indeed fall foul of the bug in Lucene which deletes shards: this
message in the logs indicates that: "shard allocated for local recovery
(post api), should exists, but doesn't"
- a 2-node cluster is never enough for creating an ES cluster. These
mini-clusters are naturally exposed for split brains by just taking one
node down and restarting it without having network connection to the
running node. The setting minimum_master_nodes will not work in this case.
--->> After bringing up the second node (which went down) the cluster
state showed clearly that the second node has joined. So do you mean the
cluster state api can show false message and not always reliable?
If for some reason, your nodes stop being able to see each other, both
nodes will think that the other node has disappeared and each node will
form a new cluster. Then you will have a split brain. The
minimum_master_nodes setting says "if you don't see at least this many
master-eligible nodes, then don't try to form a cluster". So if you have 3
nodes, and minimum_master_nodes set to 2, then if one node disconnects, it
won't see enough master eligible nodes, and won't form a cluster by itself.
Instead it will keep looking for a cluster to join.
clint
On 13 July 2013 20:02, Amit Singh amitsingh.kec@gmail.com wrote:
My further addition,
responses in line (Blue text)--->
On Sat, Jul 13, 2013 at 1:37 PM, Jörg Prante joergprante@gmail.com
wrote:
Must agree with Ivan, and sorry for being impolite.
Amit, ES has replica level 1 active by default for very good reason. By
setting replica level 0, you were fully aware to opt out data recovery and
you were accepting data loss.
--->> Fully aware of this fact! I am fine if I loose a node and because
of which I loose data. But I am struggling to find why it would happen
otherwise. Another education I would need is, how a replica will help the
cause. Do you mean if I have a replica it will check against that before
sending a delete request or it will delete the replica as well? If it
checks the replica beforehand then make sense. Another related question is
if a shard/index gets corrupted (Yet to find actual reason why it gets
corrupted) what is the chance of the replica getting corrupted as well. I
experienced this earlier couple of times and hence I could not find much fo
importance of replicas apart from the fact that it can serve additional
search requests. Some sort of education will definitely help here.
ES itself provides only a simple, Lucene-based shard index checker for
data repair. But this is only an emergency tool for file system crash or
lost files and no guarantuee to recover anything.
Amit, you have the following issues:
- a 2-node cluster is never enough for creating an ES cluster. These
mini-clusters are naturally exposed for split brains by just taking one
node down and restarting it without having network connection to the
running node. The setting minimum_master_nodes will not work in this case.
--->> After bringing up the second node (which went down) the cluster
state showed clearly that the second node has joined. So do you mean the
cluster state api can show false message and not always reliable?
- ES does not tell you if there is a split brain. There is a cluster
state (red, yellow, green). You must strictly obey the cluster state in
your clients while indexing and immediately stop indexing when cluster is
not green, or if operations failed. Any data you push into a cluster by
being not green is risky and the risk is they can't be replayed/recovered
by an ES cluster later.
---> Great suggestion. Will implement this. But if the cluster states
are not reliable (based on the pervious explanation) then it will defeat
the purpose. There has to be something that we should be able to rely on.
If its the cluster state then I clearly remember the cluster state shows 2
nodes.
- dangling indexes are the consequence of two masters alive and
connected. One master will propose to delete the indexes while the other
may not. This is random because there is no byzantine fault tolerance in ES
- the masters will conflict in everything they do and they will not agree
about the cluster state. There is no known method to recover a split master
node to join a partial cluster again. And letting the masters continue to
coexist is highly probable that you let them delete all the indexes that
were dangling after a delay, at random.
---> Is there any api to find how many masters alive in a cluster?
Probably that needs to be considered as well. Can there be a situation
where the masters meta information gets corrupted or lost? Where does ES
keep the meta information. If so can this trigger delete request? Looking
at logs I could see that after one hour of bringing the 2nd node up it
starts showing shard recovery exception and then after few hours along with
shard recovery exception it starts delete index messages. Also start
showing dangling index messages. Seems like a multi organ failure scenario
:). If 2 master is alive won't their meta information be synched and will
it stil harm the old indices which are not getting indexed that point in
tim ?
Another point is that ES default network timeouts are quite short for
systems that must operate in heavily loaded network enviroments (VMs, EBS)
so first config setting is to increase the network tcp timeouts to mitigate
the risk of node disconnects.
-->>Yes make sense -- this property right? [discovery.zen.ping.timeout]
Maybe one chapter on the ES doc pages should describe the pros and cons
of replica setup and the consequences of data loss and dangling indexes
more clearly since many users seem to neglect the importance of creating a
fault tolerance replica-based setup.
---> Yes very much required!!
Thanks
Amit
On Friday, July 12, 2013 5:35:15 PM UTC+5:30, Amit Singh wrote:
Hi All,
I have a Elasticsearch cluster of 2 nodes in my staging environment.
both the nodes have following config;
cluster.name: Staging
node.name: "node1"
index.number_of_shards: 5
index.number_of_replicas: 0
discovery.zen.minimum_master_**nodes: 2
discovery.zen.ping.multicast.**enabled: false
discovery.zen.ping.unicast.**hosts: ["node1-ip", "node2-ip"]
cluster.routing.allocation.**node_initial_primaries_**recoveries: 8
network.bind_host: node1-ip
network.host: node1-ip
path.data: /mnt/common/es/data,/ebs1/**common/es/data
path.work: /mnt/common/es/work
path.logs: /mnt/common/es/logs
The cluster was running fine from months. Since the staging environment
is hosted on AWS, we had to occasionally restart the nodes in case of
network break. And some times we used to have the unassigned shards, which
we manually used to delete by api call.
Today all of a sudden I saw no data in the data directory.
I looked at the logs and it said dangling index scheduled to delete in 2
hrs and deleting dangling index. And it deleted all the dangling indices.
This is extremely scary. We lost all our data and if it happens on
Production we are literally dead. We take backup for production env using
rsync which also can get deleted during synch post dangling deletion.
Would appreciate you advice.
I understand there is no way to recover the data. But what I am
interested to know is why it happened.
Note: 1) There are no other cluster with same name. 2) No trace of
nodes going down.
Thanks
Amit
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.