Inconsistent search results


(Colin Surprenant) #1

Hi,

I am seeing inconsistent search results for the same query.

My cluster is a 7 nodes 0.18.5, 6x http+data, 1x http only.

On one specific index configured as 1 shard and 5 replicas, when I
search using a very simple ?q=term query, the result count varies
across the search queries. I noticed a pattern in the result counts: x
x x y x y, x x x y x y, ... the result count pattern repeats every 6
requests.

I also noticed that some search requests over multiple indices which
includes the previous index, would sometimes simply not return any
results from that specific index every few requests.

There are no error logs, and the health status is green.

I am not sure where to look for to diagnose this problem, any suggestions?

Thanks,
Colin


(Colin Surprenant) #2

I tested a simple search query, on each node, using
&preference=_only_node:xyz to limit the search on each node. I have 6
nodes, and this particular index is configured as 1 shard and 5
replicas so each node has a shard for the full index.

2 out of the 6 nodes return a different result set from the other 4.
It seems these two return the "same" different result set.

I issued a _refresh on this index and it fixed the problem. Strange.
Isn't _refresh automatically called every second by default? Is there
something I am missing here?

Colin

On Tue, Dec 13, 2011 at 7:03 PM, Colin Surprenant
colin.surprenant@gmail.com wrote:

Hi,

I am seeing inconsistent search results for the same query.

My cluster is a 7 nodes 0.18.5, 6x http+data, 1x http only.

On one specific index configured as 1 shard and 5 replicas, when I
search using a very simple ?q=term query, the result count varies
across the search queries. I noticed a pattern in the result counts: x
x x y x y, x x x y x y, ... the result count pattern repeats every 6
requests.

I also noticed that some search requests over multiple indices which
includes the previous index, would sometimes simply not return any
results from that specific index every few requests.

There are no error logs, and the health status is green.

I am not sure where to look for to diagnose this problem, any suggestions?

Thanks,
Colin


(Lukáš Vlček) #3

Hi,

are you able to recreate it? I mean, if you drop your indices and create and index your documents, do you get this issue again?
Btw, how many documents do you have and was your index 1 shard with 5 replicas from the beginning or did you for example had just 1 shard and increased number of replicas later (after or during indexing)?

--
Regards,
Lukas

On Wednesday, December 14, 2011 at 10:22 PM, Colin Surprenant wrote:

I tested a simple search query, on each node, using
&preference=_only_node:xyz to limit the search on each node. I have 6
nodes, and this particular index is configured as 1 shard and 5
replicas so each node has a shard for the full index.

2 out of the 6 nodes return a different result set from the other 4.
It seems these two return the "same" different result set.

I issued a _refresh on this index and it fixed the problem. Strange.
Isn't _refresh automatically called every second by default? Is there
something I am missing here?

Colin

On Tue, Dec 13, 2011 at 7:03 PM, Colin Surprenant
<colin.surprenant@gmail.com (mailto:colin.surprenant@gmail.com)> wrote:

Hi,

I am seeing inconsistent search results for the same query.

My cluster is a 7 nodes 0.18.5, 6x http+data, 1x http only.

On one specific index configured as 1 shard and 5 replicas, when I
search using a very simple ?q=term query, the result count varies
across the search queries. I noticed a pattern in the result counts: x
x x y x y, x x x y x y, ... the result count pattern repeats every 6
requests.

I also noticed that some search requests over multiple indices which
includes the previous index, would sometimes simply not return any
results from that specific index every few requests.

There are no error logs, and the health status is green.

I am not sure where to look for to diagnose this problem, any suggestions?

Thanks,
Colin


(Colin Surprenant) #4

I haven't tried to recreate it on this cluster (its in production) but
I have a parallel cluster which index the same documents and did not
have this problem. Note that the production cluster went through a few
node restarts, shards relocations/(re)initializations...

This particular index is very small, about 7000 documents and the 1
shard, 5 replicas was setup at index creation using a template.

Again, what puzzle me is the fact that the API _refresh call actually
"fixed" this. But before that, the behaviour was present for quite
some time while I was tying to diagnose, until I issued the _refresh.
AFAIU there is an auto-refresh every second by default. How come a
manual refresh actually worked while the auto-refresh did not?

Colin

On Wed, Dec 14, 2011 at 4:48 PM, Lukáš Vlček lukas.vlcek@gmail.com wrote:

Hi,

are you able to recreate it? I mean, if you drop your indices and create and
index your documents, do you get this issue again?
Btw, how many documents do you have and was your index 1 shard with 5
replicas from the beginning or did you for example had just 1 shard and
increased number of replicas later (after or during indexing)?

--
Regards,
Lukas

On Wednesday, December 14, 2011 at 10:22 PM, Colin Surprenant wrote:

I tested a simple search query, on each node, using
&preference=_only_node:xyz to limit the search on each node. I have 6
nodes, and this particular index is configured as 1 shard and 5
replicas so each node has a shard for the full index.

2 out of the 6 nodes return a different result set from the other 4.
It seems these two return the "same" different result set.

I issued a _refresh on this index and it fixed the problem. Strange.
Isn't _refresh automatically called every second by default? Is there
something I am missing here?

Colin

On Tue, Dec 13, 2011 at 7:03 PM, Colin Surprenant
colin.surprenant@gmail.com wrote:

Hi,

I am seeing inconsistent search results for the same query.

My cluster is a 7 nodes 0.18.5, 6x http+data, 1x http only.

On one specific index configured as 1 shard and 5 replicas, when I
search using a very simple ?q=term query, the result count varies
across the search queries. I noticed a pattern in the result counts: x
x x y x y, x x x y x y, ... the result count pattern repeats every 6
requests.

I also noticed that some search requests over multiple indices which
includes the previous index, would sometimes simply not return any
results from that specific index every few requests.

There are no error logs, and the health status is green.

I am not sure where to look for to diagnose this problem, any suggestions?

Thanks,
Colin


(Weiwei Wang) #5

use setPreference to avoid this problem:

searchRequestBuilder.setPreference(account);

On Dec 15, 9:32 am, Colin Surprenant colin.surpren...@gmail.com
wrote:

I haven't tried to recreate it on this cluster (its in production) but
I have a parallel cluster which index the same documents and did not
have this problem. Note that the production cluster went through a few
node restarts, shards relocations/(re)initializations...

This particular index is very small, about 7000 documents and the 1
shard, 5 replicas was setup at index creation using a template.

Again, what puzzle me is the fact that the API _refresh call actually
"fixed" this. But before that, the behaviour was present for quite
some time while I was tying to diagnose, until I issued the _refresh.
AFAIU there is an auto-refresh every second by default. How come a
manual refresh actually worked while the auto-refresh did not?

Colin

On Wed, Dec 14, 2011 at 4:48 PM, Lukáš Vlček lukas.vl...@gmail.com wrote:

Hi,

are you able to recreate it? I mean, if you drop your indices and create and
index your documents, do you get this issue again?
Btw, how many documents do you have and was your index 1 shard with 5
replicas from the beginning or did you for example had just 1 shard and
increased number of replicas later (after or during indexing)?

--
Regards,
Lukas

On Wednesday, December 14, 2011 at 10:22 PM, Colin Surprenant wrote:

I tested a simple search query, on each node, using
&preference=_only_node:xyz to limit the search on each node. I have 6
nodes, and this particular index is configured as 1 shard and 5
replicas so each node has a shard for the full index.

2 out of the 6 nodes return a different result set from the other 4.
It seems these two return the "same" different result set.

I issued a _refresh on this index and it fixed the problem. Strange.
Isn't _refresh automatically called every second by default? Is there
something I am missing here?

Colin

On Tue, Dec 13, 2011 at 7:03 PM, Colin Surprenant
colin.surpren...@gmail.com wrote:

Hi,

I am seeing inconsistent search results for the same query.

My cluster is a 7 nodes 0.18.5, 6x http+data, 1x http only.

On one specific index configured as 1 shard and 5 replicas, when I
search using a very simple ?q=term query, the result count varies
across the search queries. I noticed a pattern in the result counts: x
x x y x y, x x x y x y, ... the result count pattern repeats every 6
requests.

I also noticed that some search requests over multiple indices which
includes the previous index, would sometimes simply not return any
results from that specific index every few requests.

There are no error logs, and the health status is green.

I am not sure where to look for to diagnose this problem, any suggestions?

Thanks,
Colin


(Clinton Gormley) #6

Hi Colin

Again, what puzzle me is the fact that the API _refresh call actually
"fixed" this. But before that, the behaviour was present for quite
some time while I was tying to diagnose, until I issued the _refresh.
AFAIU there is an auto-refresh every second by default. How come a
manual refresh actually worked while the auto-refresh did not?

Is it possible that you disabled the auto-refresh on that index?

Have a look at your index settings

clint


(Colin Surprenant) #7

Hi,

Nope, auto-refresh was/is not disabled.

BUT! the cluster went bezerk last night and that probably explains its
erratic behaviour. It started when I tried creating a new index, using
2 shards & 2 replicas set by template. The cluster went into a yellow
state immediately and stuck at "initializing_shards: 2". After waiting
a bit I figured I'd restart the nodes on which the shards were stuck
on initializing. The first node I stopped wouldn't reconnect to the
cluster (using ec2 discovery) , it was endlessly looping at trying to
reconnect. I tried a few more unsuccessful stop/start. After making
sure it wasn't related to any networking issues, I decided to restart
the second node on which a shard was stuck on initializing. Of course
the 2nd node did the same thing.

At that point I figured it must be the master node having problems. I
restarted the master node and it actually fixed the cluster discovery,
all nodes reconnected and the cluster went back up after a tedious
recovery :stuck_out_tongue:

Now, the cluster is not showing any more problems, no more result
inconsistencies and new index creation works.

What could have I done better to diagnose the problem and maybe avoid
restarting these two nodes and focus right away on the master node?
Anything I should have looked at when the cluster went into yellow
state to help me with the troubleshooting?

Thanks,
Colin

On Thu, Dec 15, 2011 at 6:57 AM, Clinton Gormley clint@traveljury.com wrote:

Hi Colin

Again, what puzzle me is the fact that the API _refresh call actually
"fixed" this. But before that, the behaviour was present for quite
some time while I was tying to diagnose, until I issued the _refresh.
AFAIU there is an auto-refresh every second by default. How come a
manual refresh actually worked while the auto-refresh did not?

Is it possible that you disabled the auto-refresh on that index?

Have a look at your index settings

clint


(Shay Banon) #8

Can you gist hte logs of the master node?

On Thu, Dec 15, 2011 at 7:30 PM, Colin Surprenant <
colin.surprenant@gmail.com> wrote:

Hi,

Nope, auto-refresh was/is not disabled.

BUT! the cluster went bezerk last night and that probably explains its
erratic behaviour. It started when I tried creating a new index, using
2 shards & 2 replicas set by template. The cluster went into a yellow
state immediately and stuck at "initializing_shards: 2". After waiting
a bit I figured I'd restart the nodes on which the shards were stuck
on initializing. The first node I stopped wouldn't reconnect to the
cluster (using ec2 discovery) , it was endlessly looping at trying to
reconnect. I tried a few more unsuccessful stop/start. After making
sure it wasn't related to any networking issues, I decided to restart
the second node on which a shard was stuck on initializing. Of course
the 2nd node did the same thing.

At that point I figured it must be the master node having problems. I
restarted the master node and it actually fixed the cluster discovery,
all nodes reconnected and the cluster went back up after a tedious
recovery :stuck_out_tongue:

Now, the cluster is not showing any more problems, no more result
inconsistencies and new index creation works.

What could have I done better to diagnose the problem and maybe avoid
restarting these two nodes and focus right away on the master node?
Anything I should have looked at when the cluster went into yellow
state to help me with the troubleshooting?

Thanks,
Colin

On Thu, Dec 15, 2011 at 6:57 AM, Clinton Gormley clint@traveljury.com
wrote:

Hi Colin

Again, what puzzle me is the fact that the API _refresh call actually
"fixed" this. But before that, the behaviour was present for quite
some time while I was tying to diagnose, until I issued the _refresh.
AFAIU there is an auto-refresh every second by default. How come a
manual refresh actually worked while the auto-refresh did not?

Is it possible that you disabled the auto-refresh on that index?

Have a look at your index settings

clint


(system) #9