BUG: Alternating result set across every query


(Antonio Lobato) #1

Hi everyone!

So we're implementing ElasticSearch in a few production systems, and we've
run into this show stopper of a bug. Here's the setup:

  • Cluster of 5 servers.
  • ~500 gigs of data per server (n+1 redundancy for all indexes)
  • ~3-4 indexes.
  • Unicast clustering.
  • 16 gigs of ram per box, ~60% allocated to Java heap.
  • No swapping/memory issues.

After an indeterminate amount of time, running a query like so:

: server:9200/index/_search?pretty=true

Will return a certain number of results, say, 123,456. However if you run
the same exact query on the same server a second time, the result count
(and data set) will be entirely different, ie: 122,222. Run it again, and
you get the first result set. It will alternate indefinitely until a full
cluster restart is done. A few things I have noticed:

  • This may or may not happen when a server drops out/goes offline.
  • This does not always happen only when a server goes offline.
  • The query run does not matter, results will alternate no matter what.
  • Calling a _flush on an index does not fix this.
  • It can happen to one index at one moment, and not another, but
    eventually happens to all of them.
  • The alternating results only happens on a single cluster member, not
    on all.

Ideas? Thanks!


(David Pilato) #2

I've got the same behaviour today.
It was due to a bad detection of all my nodes.

I've got 2 nodes.
Node 1 act as if it was alone.
Node 2 see the 2 nodes.

My client (transport) is aware of the existence of 2 nodes.

When I search on my cluster, I hit node1 or node2. So I have different results each Time.

I did not repair it by now but I simply think of shutting down node1, clean its data dir and restart it.

Perhaps you hit the same issue.

David

--

Le 30 juil. 2012 à 19:51, Antonio Lobato aj.lobato@gmail.com a écrit :

Hi everyone!

So we're implementing ElasticSearch in a few production systems, and we've run into this show stopper of a bug. Here's the setup:

Cluster of 5 servers.
~500 gigs of data per server (n+1 redundancy for all indexes)
~3-4 indexes.
Unicast clustering.
16 gigs of ram per box, ~60% allocated to Java heap.
No swapping/memory issues.
After an indeterminate amount of time, running a query like so:

: server:9200/index/_search?pretty=true

Will return a certain number of results, say, 123,456. However if you run the same exact query on the same server a second time, the result count (and data set) will be entirely different, ie: 122,222. Run it again, and you get the first result set. It will alternate indefinitely until a full cluster restart is done. A few things I have noticed:

This may or may not happen when a server drops out/goes offline.
This does not always happen only when a server goes offline.
The query run does not matter, results will alternate no matter what.
Calling a _flush on an index does not fix this.
It can happen to one index at one moment, and not another, but eventually happens to all of them.
The alternating results only happens on a single cluster member, not on all.

Ideas? Thanks!


(Antonio Lobato) #3

We use unicast for the express purpose of avoiding that particular issue.
Note that this happens when doing a query on the SAME node. Weird, isn't
it?

On Monday, July 30, 2012 1:59:47 PM UTC-4, David Pilato wrote:

I've got the same behaviour today.
It was due to a bad detection of all my nodes.

I've got 2 nodes.
Node 1 act as if it was alone.
Node 2 see the 2 nodes.

My client (transport) is aware of the existence of 2 nodes.

When I search on my cluster, I hit node1 or node2. So I have different
results each Time.

I did not repair it by now but I simply think of shutting down node1,
clean its data dir and restart it.

Perhaps you hit the same issue.

David

--

Le 30 juil. 2012 à 19:51, Antonio Lobato aj.lobato@gmail.com a écrit :

Hi everyone!

So we're implementing ElasticSearch in a few production systems, and we've
run into this show stopper of a bug. Here's the setup:

  • Cluster of 5 servers.
  • ~500 gigs of data per server (n+1 redundancy for all indexes)
  • ~3-4 indexes.
  • Unicast clustering.
  • 16 gigs of ram per box, ~60% allocated to Java heap.
  • No swapping/memory issues.

After an indeterminate amount of time, running a query like so:

: server:9200/index/_search?pretty=true

Will return a certain number of results, say, 123,456. However if you run
the same exact query on the same server a second time, the result count
(and data set) will be entirely different, ie: 122,222. Run it again, and
you get the first result set. It will alternate indefinitely until a full
cluster restart is done. A few things I have noticed:

  • This may or may not happen when a server drops out/goes offline.
  • This does not always happen only when a server goes offline.
  • The query run does not matter, results will alternate no matter what.
  • Calling a _flush on an index does not fix this.
  • It can happen to one index at one moment, and not another, but
    eventually happens to all of them.
  • The alternating results only happens on a single cluster member, not
    on all.

Ideas? Thanks!


(David Pilato) #4

Yes. I use Unicast too.

David.

--

Le 30 juil. 2012 à 20:04, Antonio Lobato aj.lobato@gmail.com a écrit :

We use unicast for the express purpose of avoiding that particular issue. Note that this happens when doing a query on the SAME node. Weird, isn't it?

On Monday, July 30, 2012 1:59:47 PM UTC-4, David Pilato wrote:
I've got the same behaviour today.
It was due to a bad detection of all my nodes.

I've got 2 nodes.
Node 1 act as if it was alone.
Node 2 see the 2 nodes.

My client (transport) is aware of the existence of 2 nodes.

When I search on my cluster, I hit node1 or node2. So I have different results each Time.

I did not repair it by now but I simply think of shutting down node1, clean its data dir and restart it.

Perhaps you hit the same issue.

David

--

Le 30 juil. 2012 à 19:51, Antonio Lobato aj.lobato@gmail.com a écrit :

Hi everyone!

So we're implementing ElasticSearch in a few production systems, and we've run into this show stopper of a bug. Here's the setup:

Cluster of 5 servers.
~500 gigs of data per server (n+1 redundancy for all indexes)
~3-4 indexes.
Unicast clustering.
16 gigs of ram per box, ~60% allocated to Java heap.
No swapping/memory issues.
After an indeterminate amount of time, running a query like so:

: server:9200/index/_search?pretty=true

Will return a certain number of results, say, 123,456. However if you run the same exact query on the same server a second time, the result count (and data set) will be entirely different, ie: 122,222. Run it again, and you get the first result set. It will alternate indefinitely until a full cluster restart is done. A few things I have noticed:

This may or may not happen when a server drops out/goes offline.
This does not always happen only when a server goes offline.
The query run does not matter, results will alternate no matter what.
Calling a _flush on an index does not fix this.
It can happen to one index at one moment, and not another, but eventually happens to all of them.
The alternating results only happens on a single cluster member, not on all.

Ideas? Thanks!


(Antonio Lobato) #5

Hm, we're not facing a split brain like you. I wish it were something that
easy to figure out. :frowning:


(Andy Wick) #6

I think by default queries alternate between primary and replicas, so you
could try using the preference argument and see if you get consistant
results, that would likely mean some kind of replication delay or issue.

http://www.elasticsearch.org/guide/reference/api/search/preference.html

ex: server:9200/index/_search?pretty=true&preference=_primary_first


(Antonio Lobato) #7

Wow, that's a great find. I can't test it at the moment, but this seems
like it would be the issue.

So I guess my next question is -- why are the replicas out of date? This
issue happened this morning on an index that is 200 million documents
strong, but the index has not been updated for 12+ hours. I can't imagine
it would take 12 hours to sync up. Further, I've had this happen on an
index that has only 3 million documents. Any ideas on what to look for in
troubleshooting replication delays?


(Filirom1) #8

I confirm that I often see this issue (I am testing ES, so I often erase
everything and then index new documents)

hits.total: 25 256 626
hits.total: 25 255 381

Adding preference=_primary_first fix hits.total to 25 255 381

I have 3 nodes.

Cheers
Romain

2012/7/30 Antonio Lobato aj.lobato@gmail.com

Wow, that's a great find. I can't test it at the moment, but this seems
like it would be the issue.

So I guess my next question is -- why are the replicas out of date? This
issue happened this morning on an index that is 200 million documents
strong, but the index has not been updated for 12+ hours. I can't imagine
it would take 12 hours to sync up. Further, I've had this happen on an
index that has only 3 million documents. Any ideas on what to look for in
troubleshooting replication delays?


(Antonio Lobato) #9

Good to know that I'm not the only one seeing this. I wonder what's going
on.


(Andy Wick) #10

Curious that the replicas seem to have MORE documents. Are you doing
DELETEs?


(Filirom1) #11

No DELETEs. Only POSTs with random IDs

2012/7/31 Andy Wick andywick@gmail.com

Curious that the replicas seem to have MORE documents. Are you doing
DELETEs?


(John Ohno) #12

If a node crashes for some reason that prevents later shard recovery from
working properly and a replica on that node is smaller but not corrupted,
it will return a smaller subset of documents. I had this precise problem
after running out of disk space on one node.

Check to see if one of your nodes is or was at one time out of space, and
force it to regenerate all its replicas.

On Monday, July 30, 2012 1:51:22 PM UTC-4, Antonio Lobato wrote:

Hi everyone!

So we're implementing ElasticSearch in a few production systems, and we've
run into this show stopper of a bug. Here's the setup:

  • Cluster of 5 servers.
  • ~500 gigs of data per server (n+1 redundancy for all indexes)
  • ~3-4 indexes.
  • Unicast clustering.
  • 16 gigs of ram per box, ~60% allocated to Java heap.
  • No swapping/memory issues.

After an indeterminate amount of time, running a query like so:

: server:9200/index/_search?pretty=true

Will return a certain number of results, say, 123,456. However if you run
the same exact query on the same server a second time, the result count
(and data set) will be entirely different, ie: 122,222. Run it again, and
you get the first result set. It will alternate indefinitely until a full
cluster restart is done. A few things I have noticed:

  • This may or may not happen when a server drops out/goes offline.
  • This does not always happen only when a server goes offline.
  • The query run does not matter, results will alternate no matter what.
  • Calling a _flush on an index does not fix this.
  • It can happen to one index at one moment, and not another, but
    eventually happens to all of them.
  • The alternating results only happens on a single cluster member, not
    on all.

Ideas? Thanks!


(Antonio Lobato) #13

None of our nodes were out of space at any point in time. :confused:


(Moshe Sucaz) #14

Hi, how did you solve it? I have the same issue now with elastic 2.3.4


(system) #15