Nice, thanks for helping get to the bottom of this, I have changed the title as you suggested. Now I restored from backup one of the indices that had the problem and I saw again the same thing, shard 1 was still missing a sync id, indeed suggesting it didn't flush on its own:
{
"indices": {
"wsjiis-2018.01.21": {
"shards": {
"0": [
{
"commit": {
"id": "XemNxYf37qUO8/tesSgBqA==",
"generation": 15,
"user_data": {
"local_checkpoint": "14136041",
"max_unsafe_auto_id_timestamp": "1516492801248",
"translog_uuid": "Hj9op1G_QOW7WBrZMq65Nw",
"history_uuid": "GpWqTk4BQXuHe7KtyTWLUw",
"sync_id": "SNseJ0k-TUSgwATltPcHiA",
"translog_generation": "1",
"max_seq_no": "14136041"
},
"num_docs": 14136042
}
}
],
"1": [
{
"commit": {
"id": "BGwoqdC1ik2i1k65ZtpPuA==",
"generation": 14,
"user_data": {
"local_checkpoint": "14140162",
"max_unsafe_auto_id_timestamp": "1516492801248",
"translog_uuid": "jCNYUoUTSOeMeJNxcIs48Q",
"history_uuid": "kQIVkQzZR4615NPspXgGQA",
"translog_generation": "1",
"max_seq_no": "14140162"
},
"num_docs": 14140163
}
}
]
}
}
}
}
I then tried a manual /_flush/synced
and it completed successfully. Probably closing it, backing it up and restoring it cleared whatever was making it think it had ongoing operations. Here's the _stats
afterwards:
{
"indices": {
"wsjiis-2018.01.21": {
"shards": {
"0": [
{
"commit": {
"id": "XemNxYf37qUO8/tesSgf9w==",
"generation": 16,
"user_data": {
"local_checkpoint": "14136041",
"max_unsafe_auto_id_timestamp": "1516492801248",
"translog_uuid": "Hj9op1G_QOW7WBrZMq65Nw",
"history_uuid": "GpWqTk4BQXuHe7KtyTWLUw",
"sync_id": "HobgXxZBSyqfr7eKr9wVrg",
"translog_generation": "1",
"max_seq_no": "14136041"
},
"num_docs": 14136042
}
}
],
"1": [
{
"commit": {
"id": "BGwoqdC1ik2i1k65Ztpt9Q==",
"generation": 15,
"user_data": {
"local_checkpoint": "14140162",
"max_unsafe_auto_id_timestamp": "1516492801248",
"translog_uuid": "jCNYUoUTSOeMeJNxcIs48Q",
"history_uuid": "kQIVkQzZR4615NPspXgGQA",
"sync_id": "MYtp2toUSlKZijBHaqfczg",
"translog_generation": "1",
"max_seq_no": "14140162"
},
"num_docs": 14140163
}
}
]
}
}
}
}
I still have one index backed up that should be in the same state with no sync id in case someone has any suggestion for any different testing on it. Otherwise we will have to wait for the issue to appear again with a new index that would also fail to do a synced flush.
IIRC the issue has appeared three times in total and only after upgrading to 6.0.