Hi,
Sometimes, when doing a full restart of my cluster, some indices
turned up empty (0 docs and green). All data in these indices seems to
be lost!
My configuration is the following:
version 0.16.4 (snapshot), also occured in a snapshoted version of 0.16.3
40 nodes, +-100 indices.
I use 1 shard / index and 1 replica.
I am using the local gateway with unicast discovery.
I am using rivers to index new data into the database, and they are
automatically already being started during the recovery process.
Indices are also created on the fly.
When I start my cluster, the nodes get added to the cluster. When the
40 nodes are
reached, the recovery is starting. All the indices turn to yellow, and
then slowly to green.
The problem is, some indices quickly turned to green and lost all
their data. Greping
for the concerned indices over the log file does not reveal anything.
By looking at the data directory, I found some servers containing
still data in /elasticsearch/search/nodes/0/indices/index001858.
The size of this directory is the following:
Server A) 4.0KB
Server B) 1.4GB
Server C) 36KB
Server D) 564MB
Server E) 4.0K
Server F) 946MB
Server G) 28KB
The cluster state has the index allocated on server C + G, having 36KB
and 28KB, and not f.i. on server B holding 1.4GB.
I was wondering why the data available on the hdd is not being
recovered, and why an empty index is being recovered.
On startup I am also getting some warning messages about multicast,
altough I am using unicast discovery:
discovery:
zen.ping:
multicast:
enabled:false
unicast:
hosts: 192.168.5.1[9300], 192.168.5.2[9300], 192.168.5.3[9300],
192.168.5.4[9300]
the warning messages I am getting:
[2011-07-14 17:16:54,481][WARN ][transport.netty ] [Wagner,
Kurt] Exception caught on netty layer [[id: 0x1ebe99f8]]
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:30)
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:480)
at org.elasticsearch.common.netty.channel.socket.nio.NioClientSocketPipelineSink.connect(NioClientSocketPipelineSink.java:140)
at org.elasticsearch.common.netty.channel.socket.nio.NioClientSocketPipelineSink.eventSunk(NioClientSocketPipelineSink.java:103)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:555)
at org.elasticsearch.common.netty.channel.Channels.connect(Channels.java:541)
at org.elasticsearch.common.netty.channel.AbstractChannel.connect(AbstractChannel.java:218)
at org.elasticsearch.common.netty.bootstrap.ClientBootstrap.connect(ClientBootstrap.java:227)
at org.elasticsearch.common.netty.bootstrap.ClientBootstrap.connect(ClientBootstrap.java:188)
at org.elasticsearch.transport.netty.NettyTransport.connectToChannels(NettyTransport.java:504)
at org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:475)
at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:126)
at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing$3.run(UnicastZenPing.java:198)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
[2011-07-14 17:16:54,482][WARN ][transport.netty ] [Wagner,
Kurt] Exception caught on netty layer [[id: 0x736e788c]]
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:30)
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:480)
at org.elasticsearch.common.netty.channel.socket.nio.NioClientSocketPipelineSink.connect(NioClientSocketPipelineSink.java:140)
at org.elasticsearch.common.netty.channel.socket.nio.NioClientSocketPipelineSink.eventSunk(NioClientSocketPipelineSink.java:103)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:555)
at org.elasticsearch.common.netty.channel.Channels.connect(Channels.java:541)
at org.elasticsearch.common.netty.channel.AbstractChannel.connect(AbstractChannel.java:218)
at org.elasticsearch.common.netty.bootstrap.ClientBootstrap.connect(ClientBootstrap.java:227)
at org.elasticsearch.common.netty.bootstrap.ClientBootstrap.connect(ClientBootstrap.java:188)
at org.elasticsearch.transport.netty.NettyTransport.connectToChannels(NettyTransport.java:507)
at org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:475)
at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:126)
at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing$3.run(UnicastZenPing.java:198)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
[2011-07-14 17:16:54,625][WARN ][discovery.zen.ping.multicast]
[Wagner, Kurt] received ping response with no matching id [1]
[2011-07-14 17:16:54,676][INFO ][cluster.service ] [Wagner,
Kurt] detected_master [Cage,
Luke][613gLOsUSbGjE5hfZYWutQ][inet[/192.168.5.3:9300]], added {[Kine,
Benedict][0gST1OnzQbSrk5VMoUVmTw][inet[/192.168.6.9:9300]],[Matador][K2cbli_9QRyUO4fRwr21Xg][inet[/192.168.5.2:9300]],[Death-Stalker][6X5mT0NAQ2aoHellP-5U0A][inet[/192.168.5.8:9300]],[Briquette][GaKPrDx-RaqBudM2ZOuj1Q][inet[/192.168.5.10:9300]],[Grenade][wDpt3uSVReuPePjgiMdMXg][inet[/192.168.5.4:9300]],[Cage,
Luke][613gLOsUSbGjE5hfZYWutQ][inet[/192.168.5.3:9300]],[Red
Guardian][HhnTAOKbRdatVMGBrBEKTw][inet[/192.168.6.19:9300]],[Dragonwing][UZxd5EIuRMCy1wT6sExdfQ][inet[/192.168.6.20:9300]],[Hobgoblin][GpfT4JseQpWvrTYS62KsEw][inet[/192.168.5.12:9300]],[Tom
Cassidy][LscBtoZVQ9m5PF8yM5c1Fw][inet[/192.168.5.14:9300]],[Bob][nmw1mj92SKCLIxJ3ZxQ4ww][inet[/192.168.6.18:9300]],[Sligguth][UoqEO4uAR_ORTnIbx-508Q][inet[/192.168.5.15:9300]],[Invisible
Woman][kMRXVw57QgKA_UkE5_OfjQ][inet[/192.168.5.7:9300]],[Nathaniel
Richards][fmd7wh52Sc6FhB9zTxCj3Q][inet[/192.168.6.1:9300]],[Xavin][_oDz7oXhT8SiwTlZkTqZBg][inet[/192.168.5.1:9300]],},
reason: zen-disco-receive(from master [[Cage,
Luke][613gLOsUSbGjE5hfZYWutQ][inet[/192.168.5.3:9300]]])
[2011-07-14 17:16:55,061][WARN ][discovery.zen.ping.multicast]
[Wagner, Kurt] received ping response with no matching id [1]
[2011-07-14 17:16:55,365][WARN ][discovery.zen.ping.multicast]
[Wagner, Kurt] received ping response with no matching id [2]
[2011-07-14 17:16:55,635][WARN ][discovery.zen.ping.multicast]
[Wagner, Kurt] received ping response with no matching id [1]
[2011-07-14 17:16:55,943][WARN ][discovery.zen.ping.multicast]
[Wagner, Kurt] received ping response with no matching id [2]
[2011-07-14 17:16:55,953][WARN ][discovery.zen.ping.multicast]
[Wagner, Kurt] received ping response with no matching id [2]
[2011-07-14 17:16:55,961][WARN ][discovery.zen.ping.multicast]
[Wagner, Kurt] received ping response with no matching id [2]
[2011-07-14 17:16:56,176][WARN ][discovery.zen.ping.multicast]
[Wagner, Kurt] received ping response with no matching id [1]
[2011-07-14 17:16:56,537][WARN ][discovery.zen.ping.multicast]
[Wagner, Kurt] received ping response with no matching id [2]
[2011-07-14 17:16:56,798][WARN ][discovery.zen.ping.multicast]
[Wagner, Kurt] received ping response with no matching id [2]
[2011-07-14 17:16:56,888][WARN ][discovery.zen.ping.multicast]
[Wagner, Kurt] received ping response with no matching id [2]
[2011-07-14 17:16:56,891][WARN ][discovery.zen.ping.multicast]
[Wagner, Kurt] received ping response with no matching id [2]
[2011-07-14 17:16:57,000][WARN ][discovery.zen.ping.multicast]
[Wagner, Kurt] received ping response with no matching id [2]
[2011-07-14 17:16:57,048][WARN ][discovery.zen.ping.multicast]
[Wagner, Kurt] received ping response with no matching id [1]
[2011-07-14 17:16:57,246][WARN ][discovery.zen.ping.multicast]
[Wagner, Kurt] received ping response with no matching id [1]
[2011-07-14 17:16:57,630][WARN ][discovery.zen.ping.multicast]
[Wagner, Kurt] received ping response with no matching id [1]
[2011-07-14 17:16:57,797][WARN ][discovery.zen.ping.multicast]
[Wagner, Kurt] received ping response with no matching id [2]
[2011-07-14 17:16:57,801][WARN ][discovery.zen.ping.multicast]
[Wagner, Kurt] received ping response with no matching id [2]
[2011-07-14 17:16:57,991][WARN ][discovery.zen.ping.multicast]
[Wagner, Kurt] received ping response with no matching id [1]
[2011-07-14 17:16:58,263][WARN ][discovery.zen.ping.multicast]
[Wagner, Kurt] received ping response with no matching id [2]
[2011-07-14 17:16:58,372][WARN ][discovery.zen.ping.multicast]
[Wagner, Kurt] received ping response with no matching id [2]
[2011-07-14 17:16:58,768][WARN ][discovery.zen.ping.multicast]
[Wagner, Kurt] received ping response with no matching id [1]
[2011-07-14 17:16:58,789][WARN ][discovery.zen.ping.multicast]
[Wagner, Kurt] received ping response with no matching id [1]
[2011-07-14 17:16:58,983][WARN ][discovery.zen.ping.multicast]
[Wagner, Kurt] received ping response with no matching id [2]
[2011-07-14 17:16:59,186][WARN ][discovery.zen.ping.multicast]
[Wagner, Kurt] received ping response with no matching id [1]
[2011-07-14 17:16:59,824][WARN ][discovery.zen.ping.multicast]
[Wagner, Kurt] received ping response with no matching id [2]
[2011-07-14 17:16:59,839][WARN ][discovery.zen.ping.multicast]
[Wagner, Kurt] received ping response with no matching id [2]
[2011-07-14 17:17:00,059][WARN ][discovery.zen.ping.multicast]
[Wagner, Kurt] received ping response with no matching id [1]
[2011-07-14 17:17:00,253][WARN ][discovery.zen.ping.multicast]
[Wagner, Kurt] received ping response with no matching id [1]
[2011-07-14 17:17:00,522][INFO ][cluster.service ] [Wagner,
Kurt] added {[Mangle][DZCqZ0yIR8mb7_HC1Kbouw][inet[/192.168.6.11:9300]],},
reason: zen-disco-receive(from master [[Cage,
Luke][613gLOsUSbGjE5hfZYWutQ][inet[/192.168.5.3:9300]]])
[2011-07-14 17:17:00,524][INFO ][cluster.service ] [Wagner,
Kurt] added {[Kala][cV775zgaRke6KCz1HZe-fA][inet[/192.168.6.14:9300]],},
reason: zen-disco-receive(from master [[Cage,
Luke][613gLOsUSbGjE5hfZYWutQ][inet[/192.168.5.3:9300]]])
[2011-07-14 17:17:00,525][INFO ][cluster.service ] [Wagner,
Kurt] added {[Tiboldt,
Maynard][orYVQrKyQdiDK-FdB0TaFw][inet[/192.168.6.10:9300]],}, reason:
zen-disco-receive(from master [[Cage,
Luke][613gLOsUSbGjE5hfZYWutQ][inet[/192.168.5.3:9300]]])
I also observe that on each cluster restart, elastic search moves all
replicas to different nodes, thus causing lots of traffic. Is this
normal behavior and could this cause the error above?
Best,
Michel