Using reindex to generate embeddings from nested field

Hello! I was reading this:

And I tried to apply the idea of using a reindex command together with an ingest pipeline to generate the embeddings of data already inside an elastic index.

Now I defined the ingest pipeline like this:

PUT _ingest/pipeline/elser-v2-test
{
  "processors": [
    {
      "inference": {
        "model_id": ".elser_model_2_linux-x86_64",
        "input_output": [ 
          {
            "input_field": "field1.field2.field3",
            "output_field": "headline_embedding"
          }
        ]
      }
    }
  ]
}

And then I run something like this:

POST _reindex?wait_for_completion=false
{
  "source": {
    "index": "initial_index",
    "size": 50 
  },
  "dest": {
    "index": "index_with_embeddings",
    "pipeline": "elser-v2-test"
  }
}

but when I run this command:

GET _tasks/OofTXXZuTgaA1afd0zXhRA:15419763

I see these exceptions

"cause": {
"type": "illegal_argument_exception",
"reason": "[field3] is not an integer, cannot be used as an index as part of path [field1.field2.field3]"
},

The documentation example uses a non-nested field in the transformation. Is it possible to do the same with nested fields?

Hi!

I've just tried to replicate this with some nested field and using the input_output field I get the same error indeed.

What did work, is switching to the field_mapping alternative which helps ELSER figure out how to reach the specific field you want (basically telling it that the address of your nested field is what it should take as the usually named text_field input).

PUT _ingest/pipeline/elser-v2-test-nest
{
  "processors": [
    {
      "inference": {
        "model_id": ".elser_model_2",
        "target_field": "content_embedding",
        "field_map": {
          "nested_content.nested_field.extra_nest": "text_field"
          }
        }
    }
  ]
}

You can see more info about this method in the docs.

Let me know if setting up like this works with your nested example!
hope it helps.

Kind regards,
Iulia