I'm new here but not new to using the Elastic stack; I apologise if I get reporting this bug wrong.
This bug exists in Elasticsearch 7.8. I know it doesn't exist in 6.8 but I haven't looked at all the in-between versions.
If I put together a bulk request with a mix of indexing requests with some that call an ingest pipeline and some that don't then there exists a problem if either the pipeline fails or the pipeline contains a drop processor. The handling of the request plays around with the wrong "slot" number in the bulk request / response as it counts only requests with pipelines to figure out that "slot" for the callback in the "executeBulkRequest" method of "IngestService". This results in the wrong items being marked as failed or dropped. I haven't played with failures so much so that needs verifying but here are some steps to reproduce for the drop processor.
curl -s -XPUT -H Content-type:application/json localhost:9200/_ingest/pipeline/mypipeline -d '{ "processors": [ { "drop": { } } ] }'
My bulk request...
{"index":{"_index":"myindex","_type":"_doc","_id":"1"}}
{"test":"1"}
{"index":{"_index":"myindex","_type":"_doc","_id":"2","pipeline":"mypipeline"}}
{"test":"2"}
curl -s -XPOST -H Content-type:application/json localhost:9200/_bulk --data-binary @bulk_request
The response...
{
"took": 242,
"ingest_took": 11,
"errors": false,
"items": [
{
"index": {
"_index": "myindex",
"_type": "_doc",
"_id": "1",
"_version": -3,
"result": "noop",
"_shards": {
"total": 0,
"successful": 0,
"failed": 0
},
"status": 200
}
},
{
"index": {
"_index": "myindex",
"_type": "_doc",
"_id": "2",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 0,
"_primary_term": 1,
"status": 201
}
}
]
}
And if you search the index it contains the wrong document...
{
"took": 55,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "myindex",
"_type": "_doc",
"_id": "2",
"_score": 1,
"_source": {
"test": "2"
}
}
]
}
}
I believe the fix should be relatively simple in IngestService but maybe I'm missing something.