Might also be relevant to add that this problem index gets a lot of updates, and the updates happen via a Painless script:
{
"script": {
"source": "ctx._source.child_ids.add(params.child_id)",
"params": {
"child_id": 1234
}
}
}
It's maintaining an array that has an average length of 3 but a maximum length of 100,000.
Perhaps if all 100,000 of those Painless updates happened to a given document in series it would place an unusual stress on the shard?
Reading @nik9000's post about how Lucene handles _source
has me wondering.