Use ELSER on data already in elastic

I was going through the documentation for the ELSER model and I keep seeing that when using an inference point the model is applied to the data at ingestion time. Is there a way to populate the semantic_text field for data already in elastic? a command for that?
Thanks!

Hey Carlos,

You can use a pipeline to re-index data that already exists in Elastic. In a lot of the examples this pipeline is applied at ingest time; but you can run such a pipeline at any other time and on different source datasets.

Check out this tutorial for example. You can ignore the first step that generates the starting data (assume this is the data you already have in Elastic), and directly use the reindex command:

POST _reindex?wait_for_completion=false
{
  "source": {
    "index": "test-data",
    "size": 10 
  },
  "dest": {
    "index": "semantic-embeddings"
  }
}

This will create a new index that also contains the newly generated semantic_text field (you can also define the name for the target field or the data structure in the index mapping).

Here is another notebook example

Or a similar question with some more relevant code examples

I ended up using a slightly modified version of your example that specifically references the pipeline I wanted to use:

POST _reindex?wait_for_completion=false
{
  "source": {
    "index": "source-index",
    "size": 50 
  },
  "dest": {
    "index": "destination-index",
    "pipeline": "elser-v2-test"
  }
}

But thanks for the answer!

1 Like