Possible indexing race condition for simulatneous add/delete

ppearcy · December 17, 2010, 12:06am

Hey,
I think that I have stumbled upon a minor bug in 0.13.0. We have
lots of various data sources flowing in, most of which have a healthy
pattern of adds/updates/deletes for each doc, ie in most cases we have
a good amount of delay between these operations.

We have a couple of data sources that have a somewhat sketchy setup,
resulting in nearly simultaneous add, update, deletes for a document.
There are updates we're going to make on our side to address this type
of needless thrashing.

However, some automated testing I have set up caught 7 instances of a
document being available on one shard and not another. For example, I
can search on that document and in a two node cluster, and every
refresh I do it will appear than disappear as the shards are round
robined. I believe, the only way an issue like this could surface is a
bug within ES. My random guess is some ordering or race condition when
a refresh occurs or when items are written to the translog.

If the bug was on ordering on my side, I would end up with documents
out of sync with my data store versus shard replicas being out of
sync.

I think this is one of those problems that will be a pain to
reproduce. You'd need a cluster with at least two nodes with a test
app running against it to have two threads where one receives an add
and the other receives the delete. Which one sticks would be non-
deterministic, but either way, there shouldn't be any drift between
the shard replicas.

I don't consider this a major issue and it isn't causing me pain, but
I wanted to point out what I have observed.

Let me know if any more details would be of use.

Thanks,
Paul

kimchy · December 19, 2010, 12:47am

Let me try and reproduce this on my end, and I will ping back...

On Fri, Dec 17, 2010 at 2:06 AM, Paul ppearcy@gmail.com wrote:

Hey,
I think that I have stumbled upon a minor bug in 0.13.0. We have
lots of various data sources flowing in, most of which have a healthy
pattern of adds/updates/deletes for each doc, ie in most cases we have
a good amount of delay between these operations.

We have a couple of data sources that have a somewhat sketchy setup,
resulting in nearly simultaneous add, update, deletes for a document.
There are updates we're going to make on our side to address this type
of needless thrashing.

However, some automated testing I have set up caught 7 instances of a
document being available on one shard and not another. For example, I
can search on that document and in a two node cluster, and every
refresh I do it will appear than disappear as the shards are round
robined. I believe, the only way an issue like this could surface is a
bug within ES. My random guess is some ordering or race condition when
a refresh occurs or when items are written to the translog.

If the bug was on ordering on my side, I would end up with documents
out of sync with my data store versus shard replicas being out of
sync.

I think this is one of those problems that will be a pain to
reproduce. You'd need a cluster with at least two nodes with a test
app running against it to have two threads where one receives an add
and the other receives the delete. Which one sticks would be non-
deterministic, but either way, there shouldn't be any drift between
the shard replicas.

I don't consider this a major issue and it isn't causing me pain, but
I wanted to point out what I have observed.

Let me know if any more details would be of use.

Thanks,
Paul

Topic		Replies	Views
Deleted items appears when searching a replica shard Elasticsearch	6	476	July 6, 2017
Search results not uniform Elasticsearch	7	344	July 6, 2017
Documents missing after indexing and refreshing Elasticsearch	16	2119	July 6, 2017
Bulk indexing and search with two different threads Elasticsearch	8	367	July 6, 2017
Disappearing Shards Elasticsearch	10	414	July 6, 2017

Possible indexing race condition for simulatneous add/delete

Related topics