Embedding generation using E5 failing for some records

I am trying to use the E5 model to generate embeddings for some documents.

I used a reindex to generate the embeddings:

POST _reindex?wait_for_completion=false
{
  "source": {
    "index": "source_index",
    "size": 50 
  },
  "dest": {
    "index": "destination_index",
    "pipeline": "e5-test"
  }
}

After almost 8 minutes I got this error:

"response": {
    "took": 472742,
    "timed_out": false,
    "total": 1997,
    "updated": 1749,
    "created": 0,
    "deleted": 0,
    "batches": 35,
    "version_conflicts": 0,
    "noops": 0,
    "retries": {
      "bulk": 0,
      "search": 0
    },
    "throttled": "0s",
    "throttled_millis": 0,
    "requests_per_second": -1,
    "throttled_until": "0s",
    "throttled_until_millis": 0,
    "failures": [
      {
        "index": "destination_index",
        "id": "324723984",
        "cause": {
          "type": "array_index_out_of_bounds_exception",
          "reason": "Index 9 out of bounds for length 8"
        },
        "status": 500
      }
    ]
  }

I see that none of the embeddings were generated, apparently because of this single record that failed.
Is there a way for reindex to continue when a single document fails the embedding generation process?

Hi!

You can enable error handling on an re-index pipeline (in this case your e5-test) and decide what happens if an error occurs. Otherwise, by default the pipeline will indeed stop when it encounters an error.

You can see the docs here

You can either set "ignore_failure": True to completely ignore those data points; or even better set a message explaining the issue:

"on_failure": [
      {
           "set": {
                "description": "Set 'error.message'",
                "field": "error.message",
                "value": "Field 'provider' does not exist. Cannot rename to 'cloud.provider'",
                "override": False
              }
      }
]

If you want the pipeline to continue after an error, specify these properties within the processor field of the pipeline (not within the pipeline properties); otherwise the pipeline will still stop after running into an issue.

Hope this helps!