Might also be relevant to add that this problem index gets a lot of updates, and the updates happen via a Painless script:
{
  "script": {
    "source": "ctx._source.child_ids.add(params.child_id)",
    "params": {
      "child_id": 1234
    }
  }
}
It's maintaining an array that has an average length of 3 but a maximum length of 100,000.
Perhaps if all 100,000 of those Painless updates happened to a given document in series it would place an unusual stress on the shard?
Reading @nik9000's post about how Lucene handles _source has me wondering.