Missing transactions after starting new node

Running 0.13.0 snapshot, local gateway. We have 78 shards, 28M docs,
170G of data.

When adding a new node to the cluster, after all the shards are
reallocated, I see a small difference in total number of docs (seen
via REST api). During the shard re-allocations, we were actively
indexing (but at a modest steady state rate). Rerunning transactions
from the past hour, allowed search counts to become consistent.

This happened on two different clusters. In the first there was one
node and the second was added. The second time, two nodes were in the
cluster and the third node was added. In both cases, all the machines
in the cluster are identical hardware.

Please let me know what log files you may be interested in.

Thanks,

David

Do you remember which 0.13 snapshot you are working on? Maybe what date? I
have long running tests that simulate this, and did not come across it. Can
you share more info on number of indices, shards per index? I can try and
simulate it here....

-shay.banon

On Wed, Dec 1, 2010 at 1:34 AM, dbenson dbenson@dbenson.net wrote:

Running 0.13.0 snapshot, local gateway. We have 78 shards, 28M docs,
170G of data.

When adding a new node to the cluster, after all the shards are
reallocated, I see a small difference in total number of docs (seen
via REST api). During the shard re-allocations, we were actively
indexing (but at a modest steady state rate). Rerunning transactions
from the past hour, allowed search counts to become consistent.

This happened on two different clusters. In the first there was one
node and the second was added. The second time, two nodes were in the
cluster and the third node was added. In both cases, all the machines
in the cluster are identical hardware.

Please let me know what log files you may be interested in.

Thanks,

David

Wow brain fart on my part, we're on 0.13.0 release.

_status dump is here https://gist.github.com/723879

I will try and simulate it, I have a pretty extensive test that checks data
does not get lost in such cases. One last question, how do you tell that the
docs are different, is that based on the index status API?

On Wed, Dec 1, 2010 at 7:41 PM, dbenson dbenson@dbenson.net wrote:

Wow brain fart on my part, we're on 0.13.0 release.

_status dump is here Index Status · GitHub

Sorry for the delay in responding. The difference was visible via the
REST api, when issuing a search across all indexes, all docs
http://localhost:9200/_search?q=*:*

Issuing the same search via the Java APIs didn't show a document count
difference. I had an opportunity to restart the cluster during a
planned maintenance window and afterwards the document count
discrepancy went away from the REST api.

David

On Dec 2, 9:50 am, Shay Banon shay.ba...@elasticsearch.com wrote:

I will try and simulate it, I have a pretty extensive test that checks data
does not get lost in such cases. One last question, how do you tell that the
docs are different, is that based on the index status API?

On Wed, Dec 1, 2010 at 7:41 PM, dbenson dben...@dbenson.net wrote:

Wow brain fart on my part, we're on 0.13.0 release.

_status dump is herehttps://gist.github.com/723879