Missing data from replica shards after delete by query and index

ashokm · September 21, 2018, 6:58am

Hi All,

We are on ES 2.4.4. We are using delete by query to delete all docs of given type in an index using delete by query. Immediately, we index the data again. Sometimes, doc count on replica is less than primary. I am using "_primary/_replica" preference to find out the counts. If we delete the entire index and index the data gain, things are fine.

In Pre-Prod, we have 2 Node cluster and in Prod we have 6 Node cluster. Issue happens on both environments, Each index has 2 shards and 1 replica . Can you please suggest what could the root cause and how to either troubleshoot or fix the issue?

Thank you
Ashok

Christian_Dahlqvist · September 21, 2018, 7:02am

Have you waited for the operation to complete and run a refresh before getting the count?

ashokm · September 21, 2018, 1:00pm

Yes, we do wait and refresh. Here is the exact code we are using

					DeleteByQueryResponse rsp = new DeleteByQueryRequestBuilder(client, DeleteByQueryAction.INSTANCE)
													.setIndices(INDEX)
													.setTypes(TYPE)
													.setSource(new SearchSourceBuilder().query(QueryBuilders.matchAllQuery()).size(5000).toString())
													.execute()
													.actionGet();
				RefreshResponse refreshResponse = client.admin().indices().refresh(new RefreshRequest(INDEX)).actionGet();

ashokm · September 21, 2018, 8:33pm

Couple of observations

Instead of delete by query, I scroll through all docs and delete using bulk request and still same issue is seen
We are not relying on auto generated doc id
If we completely delete the index and re-index the whole data, there are no issues

Please suggest what could be going wrong for us? Thank you so much

Christian_Dahlqvist · September 22, 2018, 6:20am

Elasticsearch 2.4 is quite old and a lot of effort has gone into improving resiliency and durability in later versions. If I recall correctly, replication in Elasticsearch 2.x was asynchronous, so could be more susceptible to network issues. Is your cluster deployed within a single DC with fast and reliable connections between the nodes?

To be sure that you are indeed waiting for the job to complete and replication of the changes to finish, can you run the steps manually (verifying that they all have completed before continuing) and verify you see the same problem then?

ashokm · September 22, 2018, 6:43am

Thank you Christian for you response.

All our machines are AWS EC2 instances in a single region but on different availability zones. As I mentioned above, we don't have this issue when index is deleted completely and indexed again. Issue only happens when we delete data for few types and added them back. So, I am assuming network connectivity is not an issue

I will explore the manual option.

system · October 20, 2018, 6:43am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Same query, Different results Elasticsearch	9	6583	July 5, 2017
Inconsistent doc count Elasticsearch	2	520	September 27, 2018
Deleted docs could be still retrieved although refreshed Elasticsearch	15	1569	December 21, 2022
Dealing with high number of deleted documents Elasticsearch	9	834	December 23, 2023
Delete by query is not deleting documents from an Index Elasticsearch	4	1155	July 5, 2017

Missing data from replica shards after delete by query and index

Related topics