Ingest pipeline ELSER embedding fails with more than 1 ML node

Hello Team,

We can create embeddings using a pipeline with 1 ML node. But, when we add another node, it seems like none of the documents gets ingested through the pipeline. Attached is the reference.

note: using simulation API for one document works fine. But when the documents are in batches it fails.

Can you view how we can troubleshoot errors while ingesting through the pipeline? We have an "on_failure" processor but it does not log anything in this case, so we are clueless about what is happening and unable to ingest any data to index.

"on_failure": [
      "set": {
        "description": "Index document to '<index>'",
        "field": "_index",
        "value": "{{{_index}}}"
      "set": {
        "description": "Set error message",
        "field": "ingest.failure",
        "value": "{{ _ingest.on_failure_processor_type }} processor in pipeline {{ _ingest.on_failure_pipeline }} failed with message: {{ _ingest.on_failure_message }}"

The first place to look is the Inference Count field in the deployment stats UI. This should be increasing and continuously updating (you might need to hit the refresh button or check the UI refresh interval).

Your elser-model-2-for-ingest deployment is on 2 ml nodes so you should see the inference count increasing on both nodes as the ingest requests will be spread across the all nodes and allocations you have.

In your screen shot I see the Pending Count field is at 49, this means there are 49 inference requests queue up for processing. You should this this number go down and inference count go up.

If you do not see Pending Count decrease that is an indication that ingest is somehow stuck. I would restart the deployment if this is the case.

What version of Elasticsearch are you using please?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.