Inconsistent search results from local/primary

Hi, ES-folk --

Ran into something odd today. For one index with a small number of documents, I see 2 results if I query with preference=local and 3 results if I query with preference=primary. (Without a preference set, the result bounces back and forth between 2 and 3 items.) In case it was just an issue with data settling, I waited a bit, but the mismatch is still there. The cluster seems happy. Is this a settings/configuration issue or a potential bug?

Additional context is Elasticsearch 0.17.6, four-node cluster, S3 gateway, 5 shards with 1 replica.

Thanks in advance!

-- Paul

It sounds like a potential problem, they should be sync'ed up. You did not
change anything in the refresh interval, right?

Can you do the following and see if it still happens:

  1. Issue a refresh and check.
  2. Issue a flush and check.
  3. Issue a refresh again and check.
  4. Issue a full flush (curl -XPOST localhost:9200/_flush?full=true) and
    check.
  5. Issue a refresh and check.

On Fri, Aug 26, 2011 at 12:50 AM, Paul Brown prb@mult.ifario.us wrote:

Hi, ES-folk --

Ran into something odd today. For one index with a small number of
documents, I see 2 results if I query with preference=local and 3 results if
I query with preference=primary. (Without a preference set, the result
bounces back and forth between 2 and 3 items.) In case it was just an issue
with data settling, I waited a bit, but the mismatch is still there. The
cluster seems happy. Is this a settings/configuration issue or a potential
bug?

Additional context is Elasticsearch 0.17.6, four-node cluster, S3 gateway,
5 shards with 1 replica.

Thanks in advance!

-- Paul

Hi, Shay --

The flush/refresh process was not effective. To add some additional
context, it looks like the missing document was ingested during some
HA verification testing where hosts were successively killed and
restarted, so I'll propose that the missing document (index size 2
versus index size 3) was somehow lost in a race between switching
masters and handoff.

Is there a way to have a single index rebuild its replicas from a
current master?

-- Paul

On Fri, Aug 26, 2011 at 7:07 AM, Shay Banon kimchy@gmail.com wrote:

It sounds like a potential problem, they should be sync'ed up. You did not
change anything in the refresh interval, right?
Can you do the following and see if it still happens:

  1. Issue a refresh and check.
  2. Issue a flush and check.
  3. Issue a refresh again and check.
  4. Issue a full flush (curl -XPOST localhost:9200/_flush?full=true) and
    check.
  5. Issue a refresh and check.
    On Fri, Aug 26, 2011 at 12:50 AM, Paul Brown prb@mult.ifario.us wrote:

Hi, ES-folk --

Ran into something odd today. For one index with a small number of
documents, I see 2 results if I query with preference=local and 3 results if
I query with preference=primary. (Without a preference set, the result
bounces back and forth between 2 and 3 items.) In case it was just an issue
with data settling, I waited a bit, but the mismatch is still there. The
cluster seems happy. Is this a settings/configuration issue or a potential
bug?

Additional context is Elasticsearch 0.17.6, four-node cluster, S3 gateway,
5 shards with 1 replica.

Thanks in advance!

-- Paul

If you restart the node with the offending shard, it will resync itself
against the other shard. But, this should not happen.. . I have several
tests, both long running and integration ones that simulate nodes coming and
going while indexing to make sure it does not happen, and they all pass. Can
you maybe try and create a testcase for this?

On Fri, Aug 26, 2011 at 10:08 PM, Paul Brown prb@mult.ifario.us wrote:

Hi, Shay --

The flush/refresh process was not effective. To add some additional
context, it looks like the missing document was ingested during some
HA verification testing where hosts were successively killed and
restarted, so I'll propose that the missing document (index size 2
versus index size 3) was somehow lost in a race between switching
masters and handoff.

Is there a way to have a single index rebuild its replicas from a
current master?

-- Paul

On Fri, Aug 26, 2011 at 7:07 AM, Shay Banon kimchy@gmail.com wrote:

It sounds like a potential problem, they should be sync'ed up. You did
not
change anything in the refresh interval, right?
Can you do the following and see if it still happens:

  1. Issue a refresh and check.
  2. Issue a flush and check.
  3. Issue a refresh again and check.
  4. Issue a full flush (curl -XPOST localhost:9200/_flush?full=true) and
    check.
  5. Issue a refresh and check.
    On Fri, Aug 26, 2011 at 12:50 AM, Paul Brown prb@mult.ifario.us wrote:

Hi, ES-folk --

Ran into something odd today. For one index with a small number of
documents, I see 2 results if I query with preference=local and 3
results if
I query with preference=primary. (Without a preference set, the result
bounces back and forth between 2 and 3 items.) In case it was just an
issue
with data settling, I waited a bit, but the mismatch is still there.
The
cluster seems happy. Is this a settings/configuration issue or a
potential
bug?

Additional context is Elasticsearch 0.17.6, four-node cluster, S3
gateway,
5 shards with 1 replica.

Thanks in advance!

-- Paul

I'll give it a go. It's some combination of network partitions,
cluster membership, etc., with the partition being the more difficult
thing to simulate without multiple hosts.

On Fri, Aug 26, 2011 at 2:09 PM, Shay Banon kimchy@gmail.com wrote:

If you restart the node with the offending shard, it will resync itself
against the other shard. But, this should not happen.. . I have several
tests, both long running and integration ones that simulate nodes coming and
going while indexing to make sure it does not happen, and they all pass. Can
you maybe try and create a testcase for this?

On Fri, Aug 26, 2011 at 10:08 PM, Paul Brown prb@mult.ifario.us wrote:

Hi, Shay --

The flush/refresh process was not effective. To add some additional
context, it looks like the missing document was ingested during some
HA verification testing where hosts were successively killed and
restarted, so I'll propose that the missing document (index size 2
versus index size 3) was somehow lost in a race between switching
masters and handoff.

Is there a way to have a single index rebuild its replicas from a
current master?

-- Paul

On Fri, Aug 26, 2011 at 7:07 AM, Shay Banon kimchy@gmail.com wrote:

It sounds like a potential problem, they should be sync'ed up. You did
not
change anything in the refresh interval, right?
Can you do the following and see if it still happens:

  1. Issue a refresh and check.
  2. Issue a flush and check.
  3. Issue a refresh again and check.
  4. Issue a full flush (curl -XPOST localhost:9200/_flush?full=true) and
    check.
  5. Issue a refresh and check.
    On Fri, Aug 26, 2011 at 12:50 AM, Paul Brown prb@mult.ifario.us wrote:

Hi, ES-folk --

Ran into something odd today. For one index with a small number of
documents, I see 2 results if I query with preference=local and 3
results if
I query with preference=primary. (Without a preference set, the result
bounces back and forth between 2 and 3 items.) In case it was just an
issue
with data settling, I waited a bit, but the mismatch is still there.
The
cluster seems happy. Is this a settings/configuration issue or a
potential
bug?

Additional context is Elasticsearch 0.17.6, four-node cluster, S3
gateway,
5 shards with 1 replica.

Thanks in advance!

-- Paul