Context Error During Reindex with Elser

Hello,

I am currently running into an error where I reindex documents from an index with a subset of data, run them through a pipeline to create vector embeddings with ELSER and create passages, which also have vector embeddings. I use a similar pipeline as this one.

The error in question looks like this:

{
  "completed": true,
  "task": {
    "node": "nodeId",
    "id": 168105,
    "type": "transport",
    "action": "indices:data/write/reindex",
    "status": {
      "total": 10000,
      "updated": 0,
      "created": 59,
      "deleted": 0,
      "batches": 59,
      "version_conflicts": 0,
      "noops": 0,
      "retries": {
        "bulk": 0,
        "search": 0
      },
      "throttled_millis": 0,
      "requests_per_second": -1,
      "throttled_until_millis": 0
    },
    "description": "reindex from [1-subset] to [1-elser]",
    "start_time_in_millis": 1719323450496,
    "running_time_in_nanos": 1067795555450,
    "cancellable": true,
    "cancelled": false,
    "headers": {
      "trace.id": "3b984be232f7068413ddfb716b77fc02"
    }
  },
  "error": {
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [
      {
        "shard": -1,
        "index": null,
        "reason": {
          "type": "search_context_missing_exception",
          "reason": "No search context found for id [14257]"
        }
      }
    ],
    "caused_by": {
      "type": "search_context_missing_exception",
      "reason": "No search context found for id [14257]"
    }
  }
}

I recognize this error, a while back in my local docker container I had the same error when trying to reindex documents that took too long. I had thought it was my local hardware, however this time I am trying it on a cloud trial and I still run into this error.

Initially I thought the problem was that I had put the size at too much. This did fix the issue. Now the error is back even though I put the size on 1.

My Reindex request:

POST _reindex?wait_for_completion=false
{
  "source": {
    "index": "Index1",
    "size": 1
  }, 
  "dest": {
    "index": "Index2",
    "pipeline": "chunker-elser-v2"
  }
}

The reindex request is pretty simple, put the size on 1, and add the pipeline.

My Pipeline:


PUT _ingest/pipeline/chunker-elser-v2
{
  "processors": [
    {
      "script": {
        "description": "Chunk content into sentences by looking for . followed by a space",
        "lang": "painless",
        "if": "ctx.content != null && !ctx.content.isEmpty()",
        "source": "\n String[] envSplit = /((?<!M(r|s|rs)\\.)(?<=\\.) |(?<=\\!) |(?<=\\?) )/.split(ctx['content']);\n ctx['passages'] = new ArrayList();\n        int i = 0;\n        boolean remaining = true;\n        if (envSplit.length == 0) {\n          return\n        } else if (envSplit.length == 1) {\n          Map passage = ['text': envSplit[0]];ctx['passages'].add(passage)\n        } else {\n          while (remaining) {\n            Map passage = ['text': envSplit[i++]];\n            while (i < envSplit.length && passage.text.length() + envSplit[i].length() < params.model_limit) {passage.text = passage.text + ' ' + envSplit[i++]}\n            if (i == envSplit.length) {remaining = false}\n            ctx['passages'].add(passage)\n          }\n        }\n        ",
        "params": {
          "model_limit": 400
        }
      }
    },
    {
      "foreach": {
        "field": "passages",
        "processor": {
          "inference": {
            "model_id": ".elser_model_2",
            "input_output": {
              "input_field": "_ingest._value.text",
              "output_field": "_ingest._value.vector.predicted_value"
            },
            "on_failure": [
              {
                "append": {
                  "field": "_source._ingest.inference_errors",
                  "value": [
                    {
                      "message": "Processor 'inference' in pipeline 'chunker-elser-v2' failed with message '{{ _ingest.on_failure_message }}'",
                      "pipeline": "ml-inference-title-vector",
                      "timestamp": "{{{ _ingest.timestamp }}}"
                    }
                  ]
                }
              }
            ]
          }
        },
        "if": "ctx.passages != null"
      }
    },
    {
      "inference": {
        "if": "ctx.title != null && !ctx.title.isEmpty()",
        "model_id": ".elser_model_2",
        "input_output": {
          "input_field": "title",
          "output_field": "ml.title.vector.predicted_value"
        },
        "on_failure": [
          {
            "append": {
              "field": "_source._ingest.inference_errors",
              "value": [
                {
                  "message": "Processor 'inference' in pipeline 'ml-inference-title-vector' failed with message '{{ _ingest.on_failure_message }}'",
                  "pipeline": "ml-inference-title-vector",
                  "timestamp": "{{{ _ingest.timestamp }}}"
                }
              ]
            }
          }
        ]
      }
    }
  ]
}

As mentioned before the pipeline is based on this example from Elastics blog. However I modified it slightly.

Would increasing the scroll time be a viable solution?

Any Idea how to fix this would be appreciated!

I have also seen this blog popup which is awesome!

hi @Chenko ,

Have you tried using the scroll query parameter for the Reindex API?

This should allow you to increase how long the search context sticks around.

You may also want to check to make sure you're not flooding the inference queue and getting inference errors. You can manually throttle your reindex operation with requests_per_second, but hopefully that will not be necessary.

Hi Sean,

Thanks for the swift response!

Yes I was thinking of increasing the time the search context sticks around. I am however not sure about what I should be raising this to, IIRC. 5 minutes is the default and would seem to be plenty to just embed 1 document.

I assume this would not be a problem since I am limiting the size to 1 in the reindex call.

PS. The documents could have a lot of passages however it should not be an insane amount of them. (I can currently not think of a way to get the maximum amount though)

Would it perhaps be a better idea to instead of reindex, do an update_by_query, with a match_all. Or would this provide the same result?

Currently I have modified my reindex query to now have the following:

POST _reindex?wait_for_completion=false&refresh=true&scroll=30m
{
  "source": {
    "index": "Index1",
    "size": 1
  }, 
  "dest": {
    "index": "Index2",
    "pipeline": "chunker-elser-v2"
  }
}

Hopefully the refresh and scroll time increase make a difference!

Yeah that's a fair point. Again, I'd suggest looking to see if you have inference errors. Its possible that your ML node or your model crashed.

I'd expect the same result eventually. Reindex should be the more efficient of the two.

1 Like

Thanks, I upped the scroll time as shown above, now it seems to not throw any errors, however if my calculations are right, 10.000 documents would take another 35 hours to embed so I will have to wait a bit to see the actual result.