_flush/synced command shows failures after being clean - indexing stopped

kredfern · June 4, 2021, 9:12am

Just wondering if the following behavior can be explained so that we have confidence to continue the rolling restart procedure. We are using v7.4 cluster.

We have stopped all indexing into the cluster before these steps are taken.

Step 2 talks about issuing a _flush/synced command until no failures are seen.
"When you perform a synced flush, check the response to make sure there are no failures. Synced flush operations that fail due to pending indexing operations are listed in the response body, although the request itself still returns a 200 OK status. If there are failures, reissue the request."

We do this and get a situation when there are no failures and then reissue a synced flush 10s later and start seeing failures again? Is this OK, expected? Can we have confidence to continue the rolling restart process?

"_flush/synced?pretty"
Thu 3 Jun 11:30:25 UTC 2021
{
"_shards" : {
"total" : 16708,
"successful" : 16678,
"failed" : 30
},
..
..
Then having a situation when there are no failed and issuing the command again ....(with more detail)

_flush/synced?pretty" | grep -i -B 1 -A 15 failures
Thu 3 Jun 12:09:52 UTC 2021
"failed" : 1,
"failures" : [
{
"shard" : 2,
"reason" : "pending operations",
"routing" : {
"state" : "STARTED",
"primary" : true,
"node" : "4SO4lMH-TMik7l1rNBhBdA",
"relocating_node" : null,
"shard" : 2,
"index" : "7_0df0304d_01b6_4cba_825f_58f36bbfdb2f-000001",
"allocation_id" : {
"id" : "gCT0LSolRai7XguuBMLvZw"
}
}

Is there anyway to know what these pending operations are? Should we care after a single situation with no failures?

Obviously, something else is going on behind the scenes.
Any other debug/options to help shed light on what is going on?

Looking into elastic code a little more, I see code snippets like the following ..
if (indexWriter.hasUncommittedChanges()) { logger.trace( "can't sync commit [{}]. have pending changes" , syncId); return SyncedFlushResult.PENDING_OPERATIONS;

https://lucene.apache.org/core/7_7_3/core/org/apache/lucene/index/IndexWriter.html#hasUncommittedChanges--
" * Returns true if there may be changes that have not been committed. There are cases where this may return true when there are no actual "real" changes to the index, for example if you've deleted by Term or Query but that Term or Query does not match any documents. Also, if a merge kicked off as a result of flushing a new segment during commit() , or a concurrent merged finished, this method may return true right after you had just called commit() ."

warkolm · June 8, 2021, 3:53am

Welcome to our community!

Is there indexing in that time?

Please be aware that 7.4 is now EOL, you should upgrade to 7.13.

kredfern · June 8, 2021, 8:37am

Definitely no indexing was happening at the time when the _flush/synced commands were being run.

Yes, realize that we are EOL here and planning to upgrade soon.

system · July 6, 2021, 8:37am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Why is synced flush failing for this index? Elasticsearch	3	2006	March 16, 2020
Flush sync failed during upgrade process Elasticsearch	2	682	March 20, 2018
Synced Flush Causes Node to Restart Elasticsearch	17	3940	July 6, 2017
Synced flush failed problem Elasticsearch	2	341	October 7, 2020
Failed to execute `flush synced` Elasticsearch	1	323	October 8, 2020

_flush/synced command shows failures after being clean - indexing stopped

Related topics