Minor consistency issues with a 3 node cluster on 0.12

ppearcy · October 25, 2010, 10:04pm

Running 0.12 on a 3 node cluster. Built up ~25 million documents.
Afterwards, everything was good and the match all query returned the
same number of results every time.

I did a cluster shutdown command and then started up every node. One
node got a head start, as a couple were timing out of discovery after
30 seconds. After a couple minutes I got the other two nodes started.

I am now in a green state and when i execute a : query the results
change +/- 7 results.

At no time did this cluster go split brain and the data was consistent
before cluster shutdown.

Here is my hunch:

First node started got a head start
Recovered a shard or shards from the gateway because it wasn't local
Other two nodes started up and began local recovery and the local
shard must (may?) have been out of sync with the one on the gateway.

Any idea what could have happened and how to prevent this?

Thanks,
Paul

kimchy · October 26, 2010, 7:30pm

When using shared gateway: only the first shard in a replication group
recovers its state from the gateway (and not recovering what it already has
in its work dir). Then, the other shard replicas recover their state from
the already running shard (reusing what they can of the local storage). Its
strange what you got, can I have more details regarding:

Do you still have the logs? Can you mail them to me?
Was there indexing process running while this happened?
Was the work dir cleared?
(should be the first question): You are using shared NFS gateway, right?

-shay.banon

On Tue, Oct 26, 2010 at 12:04 AM, Paul ppearcy@gmail.com wrote:

Running 0.12 on a 3 node cluster. Built up ~25 million documents.
Afterwards, everything was good and the match all query returned the
same number of results every time.

I did a cluster shutdown command and then started up every node. One
node got a head start, as a couple were timing out of discovery after
30 seconds. After a couple minutes I got the other two nodes started.

I am now in a green state and when i execute a : query the results
change +/- 7 results.

At no time did this cluster go split brain and the data was consistent
before cluster shutdown.

Here is my hunch:

First node started got a head start

Recovered a shard or shards from the gateway because it wasn't local

Other two nodes started up and began local recovery and the local
shard must (may?) have been out of sync with the one on the gateway.

Any idea what could have happened and how to prevent this?

Thanks,
Paul

ppearcy · October 26, 2010, 9:40pm

Hey Shay,
To answer your questions:

I think so. I'll ping you with details.
All indexing was completely disabled. We have an indexing app using
the java APIs and it is the only way we index data into ES.
The work directory was not touched.
We are using a shared NFS setup and have a dedicated file server in
place to host the share. The config for this should be good. On the
client side, the mount command gives me:
10.16.104.11:/opt/wsod/esgateway on /esgateway type nfs
(rw,hard,intr,addr=10.16.104.11)

On the server side my exports file looks like:
/opt/wsod/esgateway 10.16.0.0/255.255.0.0(rw,sync,no_root_squash)

You should see details on the log files shortly.

Thanks,
Paul

On Oct 26, 1:30 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

When using shared gateway: only the first shard in a replication group
recovers its state from the gateway (and not recovering what it already has
in its work dir). Then, the other shard replicas recover their state from
the already running shard (reusing what they can of the local storage). Its
strange what you got, can I have more details regarding:

Do you still have the logs? Can you mail them to me?

Was there indexing process running while this happened?

Was the work dir cleared?

(should be the first question): You are using shared NFS gateway, right?

-shay.banon

On Tue, Oct 26, 2010 at 12:04 AM, Paul ppea...@gmail.com wrote:

Running 0.12 on a 3 node cluster. Built up ~25 million documents.
Afterwards, everything was good and the match all query returned the
same number of results every time.

I did a cluster shutdown command and then started up every node. One
node got a head start, as a couple were timing out of discovery after
30 seconds. After a couple minutes I got the other two nodes started.

I am now in a green state and when i execute a : query the results
change +/- 7 results.

At no time did this cluster go split brain and the data was consistent
before cluster shutdown.

Here is my hunch:

First node started got a head start

Recovered a shard or shards from the gateway because it wasn't local

Other two nodes started up and began local recovery and the local
shard must (may?) have been out of sync with the one on the gateway.

Any idea what could have happened and how to prevent this?

Thanks,
Paul

ppearcy · October 28, 2010, 8:28pm

Hey,
Sorry for the delay. There is nothing in the logs when this was
occurring. Everything shows a happy startup. I have enabled all
recovery logging, so hopefully will have data next time this occurs.

Thanks,
Paul

On Oct 26, 3:40 pm, Paul ppea...@gmail.com wrote:

Hey Shay,
To answer your questions:

I think so. I'll ping you with details.

All indexing was completely disabled. We have an indexing app using
the java APIs and it is the only way we index data into ES.

The work directory was not touched.

We are using a shared NFS setup and have a dedicated file server in
place to host the share. The config for this should be good. On the
client side, the mount command gives me:
10.16.104.11:/opt/wsod/esgateway on /esgateway type nfs
(rw,hard,intr,addr=10.16.104.11)

On the server side my exports file looks like:
/opt/wsod/esgateway 10.16.0.0/255.255.0.0(rw,sync,no_root_squash)

You should see details on the log files shortly.

Thanks,
Paul

On Oct 26, 1:30 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

When using shared gateway: only the first shard in a replication group
recovers its state from the gateway (and not recovering what it already has
in its work dir). Then, the other shard replicas recover their state from
the already running shard (reusing what they can of the local storage). Its
strange what you got, can I have more details regarding:

Do you still have the logs? Can you mail them to me?

Was there indexing process running while this happened?

Was the work dir cleared?

(should be the first question): You are using shared NFS gateway, right?

-shay.banon

On Tue, Oct 26, 2010 at 12:04 AM, Paul ppea...@gmail.com wrote:

Running 0.12 on a 3 node cluster. Built up ~25 million documents.
Afterwards, everything was good and the match all query returned the
same number of results every time.

I did a cluster shutdown command and then started up every node. One
node got a head start, as a couple were timing out of discovery after
30 seconds. After a couple minutes I got the other two nodes started.

I am now in a green state and when i execute a : query the results
change +/- 7 results.

At no time did this cluster go split brain and the data was consistent
before cluster shutdown.

Here is my hunch:

First node started got a head start

Recovered a shard or shards from the gateway because it wasn't local

Other two nodes started up and began local recovery and the local
shard must (may?) have been out of sync with the one on the gateway.

Any idea what could have happened and how to prevent this?

Thanks,
Paul