I'm reindexing documents with an ELSER inference pipeline to generate embeddings, but only a fraction of documents (usually around half) end up with embeddings despite the reindex completing successfully with no failures.
Setup:
-
Elasticsearch with ELSER model deployed and running
-
Source index: ~1000 documents
-
Destination index with
rank_featuresmapping for embeddings -
Pipeline:
elser_double_embedding_pipeline
Reindex Request:
POST /_reindex
{
"source": {
"index": "v1214_test_staging_20260506_181845"
},
"dest": {
"index": "v1214_test",
"pipeline": "elser_double_embedding_pipeline"
}
}
Task Status (Completed Successfully):
{
"completed": true,
"task": {
"description": "reindex from [v1214_test_staging_20260506_181845] to [v1214_test]",
"status": {
"total": 977,
"updated": 977,
"created": 0,
"deleted": 0,
"batches": 2,
"version_conflicts": 0,
"noops": 0,
"retries": {
"bulk": 2,
"search": 0
}
}
},
"response": {
"total": 977,
"updated": 977,
"failures": []
}
}
Problem: Despite successful completion, only 450 out of 977 documents have embeddings:
GET /v1214_test/_count
{
"query": {
"bool": {
"should": [
{"bool": {"must_not": {"exists": {"field": "name_embedding"}}}},
{"bool": {"must_not": {"exists": {"field": "content_embedding"}}}}
],
"minimum_should_match": 1
}
}
}
// Returns: 527 documents missing embeddings
Pipeline Configuration:
GET /_ingest/pipeline/elser_double_embedding_pipeline
{
"elser_double_embedding_pipeline": {
....
"processors": [
{
"inference": {
"model_id": ".elser_model_2_linux-x86_64",
"input_output": [
{
"input_field": "content",
"output_field": "content_embedding"
},
{
"input_field": "name",
"output_field": "name_embedding"
}
]
}
}
]
}
}
What I've Tried:
-
Running
update_by_querywith pipeline - same result -
Checking for failures in task response - none reported
-
Verifying source documents have
nameandcontentfields - they do -
Testing pipeline simulation - works correctly
-
Running reindex multiple times - consistently ~46% success rate
Questions:
-
Why would reindex report success but not apply the pipeline to all documents?
-
How can I debug which documents are failing to get embeddings when no failures are reported?
-
Is there a way to force the pipeline to process all documents or identify which ones were skipped?
The task description doesn't show the pipeline name (should show [elser_double_embedding_pipeline]), which makes me suspect the pipeline isn't being applied, but the API accepts the parameter without error.