IndexMissingException after network failure

Hi,

We're experiencing a problem where the client constantly throws an
IndexMissingException when searching after rejoining the cluster after a
network failure. The server log shows that the client is removed after
pinging it failed. When the network comes up again the log shows that
the client is added as expected but the client starts to throw
IndexMissingException when searching in an index that is known to exist.
I've managed to create a simple test that reproduces this problem. Here
are the steps:

Start an ES 0.19.2 server instance (default config):

./bin/elasticsearch -f

Start the attached Java program on the same machine as the server. I'm
using the attached elasticsearch.yml. The code will create the index
'estest' and then run a match_all search every 2 seconds.

Drop all network traffic on ports 9300 and 9301 to simulate a network
failure:

sudo iptables -A INPUT -p tcp --destination-port 9300 -j DROP ; sudo
iptables -A INPUT -p tcp --destination-port 9301 -j DROP

Wait for the server and client to give up on pinging the other side.

Start network traffic again:

sudo iptables --flush

Client and server will appear to find each other again but then the
client will start to throw IndexMissingException for every search.

It would be great if someone else could try this too and let me know if
it can be reproduced or if it's only me. I'm not not sure using iptables
like I'm doing is a good way to simulate the network problems though
we're seeing the exact same symptoms when there's a real network problem
in our production environment. In production the client and server are
on different machines.

Thanks!
Niklas Therning

Heya,

First, thanks for the great recreation!. It was a nasty one to track
down, but will be fixed shortly. Here is the issue:
When a node disconnects from the cluster (not enough master nodes, or a client node) and rejoins it might not update its internal routing table · Issue #1896 · elastic/elasticsearch · GitHub.

-shay.banon

On Sun, Apr 29, 2012 at 4:10 PM, Niklas Therning niklas@therning.orgwrote:

Hi,

We're experiencing a problem where the client constantly throws an
IndexMissingException when searching after rejoining the cluster after a
network failure. The server log shows that the client is removed after
pinging it failed. When the network comes up again the log shows that the
client is added as expected but the client starts to throw
IndexMissingException when searching in an index that is known to exist.
I've managed to create a simple test that reproduces this problem. Here are
the steps:

Start an ES 0.19.2 server instance (default config):

./bin/elasticsearch -f

Start the attached Java program on the same machine as the server. I'm
using the attached elasticsearch.yml. The code will create the index
'estest' and then run a match_all search every 2 seconds.

Drop all network traffic on ports 9300 and 9301 to simulate a network
failure:

sudo iptables -A INPUT -p tcp --destination-port 9300 -j DROP ; sudo
iptables -A INPUT -p tcp --destination-port 9301 -j DROP

Wait for the server and client to give up on pinging the other side.

Start network traffic again:

sudo iptables --flush

Client and server will appear to find each other again but then the client
will start to throw IndexMissingException for every search.

It would be great if someone else could try this too and let me know if it
can be reproduced or if it's only me. I'm not not sure using iptables like
I'm doing is a good way to simulate the network problems though we're
seeing the exact same symptoms when there's a real network problem in our
production environment. In production the client and server are on
different machines.

Thanks!
Niklas Therning