Elasticsearch primary and replica shards not sync after bulk load

yanwang · July 17, 2018, 9:51am

Hi Guys,

My team has been using the elasticsearch on aws ec2 for searching for about 2 years. We are always bothered by the out-of-sync issue between primary and replica shards. In our elasticsearch cluster we have mainly 2 indices, each of which has 6 primary shards, and 2 replicas for each shards(namely 6 primary shards, 12 replica shards, totally 18 for every index). One index is used for searching, so it just has partial data but more fielddata in the mapping. Another one holds the full data but it is just used to query by id. We do bulk load regularly every monday to both of these 2 indices with the same dataset by our elasticsearch-consumer.

But the current issue is, after the bulk load with the latest data, we also do a bulk delete to those data which is not updated before the timestamp of beginning of bulk load. We keep this running a period of time, and then query by search-index/search-type/_search?sort=publishDate, will see a few docs published 1 or 2 month ago are still live in the index. I hit the stats API _stats?level=shards, and the results show the primary and replica shards have different count of docs.

Also, if I make the timestamp query, for different tries, the elasticsearch returns different results. Sometimes the total of results is 0, but sometimes it has 6 or 8 or more. But if I set the preference to _primary, the results is just 0, which is desirable. Correspondingly, if I change preference to _replica, I will see the results are more than 0.

All the foundings above show the fact that in our consumer, after the bulk load and bulk deletion(there are 15 mins interval between these 2 operations), the elasticsearch does not successfully sync up the different shards. I tried to run _flush/synced but it also fails because we keep indexing data in the meanwhile. It is not possible for us to pause and do the flush.

Does anyone have any thoughts about solving this issue? Thanks in advance.

warkolm · July 17, 2018, 9:53am

What version are you on.

yanwang · July 17, 2018, 10:12am

The current version is 5.1.1 but we are planning to upgrade to 6.2.3

yanwang · July 19, 2018, 8:52pm

anyone has any thoughts? Could it be a bug?

Christian_Dahlqvist · July 19, 2018, 8:55pm

Are you changing the refresh interval during bulk load? If so, do you run a manual refresh once the bulk upload has completed?

yanwang · July 19, 2018, 9:12pm

We did not change the refresh interval in the bulk load. Yes we manually refresh the table and then run flush/sync but it failed because of some pending operations. The thing is I can even see the document which was published a few weeks ago. Not sure the reason why es failed to delete the in the deleting process followed by the bulk load.

Christian_Dahlqvist · July 19, 2018, 9:17pm

Do you have any cluster or index settings that are not standard? How many nodes do you have in the cluster? What does your elasticsearch.yml file look like?

system · August 16, 2018, 9:17pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Replicas out of sync Elasticsearch	19	4777	February 28, 2018
Primary and Replica shards are giving different results for same query Elasticsearch	1	610	May 23, 2019
One primary shards are "lost" permanently when updating data Elasticsearch	1	328	July 6, 2017
Consistency between multiple _search requests Elasticsearch	1	389	April 13, 2018
ES 5.X - Primary and replica shards not in sync Elasticsearch	8	5379	November 1, 2017

Elasticsearch primary and replica shards not sync after bulk load

Related topics