Elasticsearch reindex with ELSER pipeline succeeds but only generates embeddings for fraction of documents - no failures reported

I'm reindexing documents with an ELSER inference pipeline to generate embeddings, but only a fraction of documents (usually around half) end up with embeddings despite the reindex completing successfully with no failures.

Setup:

  • Elasticsearch with ELSER model deployed and running

  • Source index: ~1000 documents

  • Destination index with rank_features mapping for embeddings

  • Pipeline: elser_double_embedding_pipeline

Reindex Request:

POST /_reindex
{
  "source": {
    "index": "v1214_test_staging_20260506_181845"
  },
  "dest": {
    "index": "v1214_test",
    "pipeline": "elser_double_embedding_pipeline"
  }
}

Task Status (Completed Successfully):

{
  "completed": true,
  "task": {
    "description": "reindex from [v1214_test_staging_20260506_181845] to [v1214_test]",
    "status": {
      "total": 977,
      "updated": 977,
      "created": 0,
      "deleted": 0,
      "batches": 2,
      "version_conflicts": 0,
      "noops": 0,
      "retries": {
        "bulk": 2,
        "search": 0
      }
    }
  },
  "response": {
    "total": 977,
    "updated": 977,
    "failures": []
  }
}

Problem: Despite successful completion, only 450 out of 977 documents have embeddings:

GET /v1214_test/_count
{
  "query": {
    "bool": {
      "should": [
        {"bool": {"must_not": {"exists": {"field": "name_embedding"}}}},
        {"bool": {"must_not": {"exists": {"field": "content_embedding"}}}}
      ],
      "minimum_should_match": 1
    }
  }
}
// Returns: 527 documents missing embeddings

Pipeline Configuration:

GET /_ingest/pipeline/elser_double_embedding_pipeline
{
  "elser_double_embedding_pipeline": {
....
    "processors": [
      {
        "inference": {
          "model_id": ".elser_model_2_linux-x86_64",
          "input_output": [
            {
              "input_field": "content",
              "output_field": "content_embedding"
            },
            {
              "input_field": "name",
              "output_field": "name_embedding"
            }
          ]
        }
      }
    ]
  }
}

What I've Tried:

  1. Running update_by_query with pipeline - same result

  2. Checking for failures in task response - none reported

  3. Verifying source documents have name and content fields - they do

  4. Testing pipeline simulation - works correctly

  5. Running reindex multiple times - consistently ~46% success rate

Questions:

  1. Why would reindex report success but not apply the pipeline to all documents?

  2. How can I debug which documents are failing to get embeddings when no failures are reported?

  3. Is there a way to force the pipeline to process all documents or identify which ones were skipped?

The task description doesn't show the pipeline name (should show [elser_double_embedding_pipeline]), which makes me suspect the pipeline isn't being applied, but the API accepts the parameter without error.

Have you checked the elasticsearch logs from all nodes? I'm wondering if something silently failed because the model was not available on some node where it was expected to be (just guessing)? If there is nothing in the logs, it might be worth rerunning with debug-level logging for the org.elasticsearch.xpack.ml.action package.

Also, could you share your pipeline? Does your inference processor have ignore_failure set to true?