How to delete a field from an index while reindexing?

I have an index that I want to reindex and change its dynamic type from mapping to strict, but then it complains that a field doesn't exist in the new index mapping. How can I delete the field from the old index so that reindexing works?

You can use an ingest pipeline for this. An ingest pipeline pre-processes a document, before it is indexed. In this case, you can use a pipeline that uses the remove processor (which removes a field from your documents).

The first step would be to define your pipeline. Let's say you want to remove the foo field, then your pipeline would look like this:

PUT _ingest/pipeline/my_pipeline
{
  "description": "Removes the 'foo' field", 
  "processors": [
    {
      "remove": {
        "field": "foo"
      }
    }
  ]
}

Now, you can reindex your data and apply this my_pipeline pipeline to your documents as those get reindexed:

POST _reindex
{
  "source": {
    "index": "my_old_index"
  },
  "dest": {
    "index": "my_new_index",
    "pipeline": "my_pipeline"
  }
}
6 Likes

I get an error reason:"field [foo] not present as part of path [foo]". That field exists in about half the documents but not all of them. Can the ingest pipeline still work in this case?

@abdon Is there a way to do this via _update_by_querywhere the query is a match_all query?

@i333 Yes, as the docs explain, you can set ignore_missing to true:

PUT _ingest/pipeline/my_pipeline
{
  "description": "Removes the 'foo' field",
  "processors": [
    {
      "remove": {
        "field": "foo",
        "ignore_missing": true
      }
    }
  ]
}

This requires that you are on version 6.4 or later

3 Likes

@Jaspreet_Singh Yes, you can also use an ingest pipeline with _update_by_query:

POST my_index/_update_by_query?pipeline=my_pipeline
1 Like

@abdon Very interesting, thanks for sharing.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.