Reindexing an index which had document added by ingest pipeline

Aditya_Teltia · July 9, 2023, 6:31am

I have an index my-idx-09-2022. I made a ingest pipeline so that all the updates from now of my-idx-09-2022 will go to a new index i.e my-idx-new-09-2023.

Python code:

def create_write_redirect_pipeline(source_client, pipeline_name, new_index):
    source_client.ingest.put_pipeline(
        id=pipeline_name,
        body={
            "processors": [
                {
                    "set": {
                        "field": "_index",
                        "value": new_index
                    }
                }
            ]
        }
    )

def associate_pipeline_with_index(source_client, source_index, pipeline_name):
    source_client.indices.put_settings(
        index=source_index,
        body={
            "index": {
                "default_pipeline": pipeline_name
            }
        }
    )

Now when I am trying to reindex this new index my-idx-new-09-2023 to an existing index in cluster2 i.e let say restored_idx I am getting this error.

HTTP_EXCEPTIONS.get(meta.status, ApiError)(
elasticsearch.BadRequestError: BadRequestError(400, "{'took': 9, 'timed_out': False, 'total': 10, 'updated': 0, 'created': 0, 'deleted': 0, 'batches': 1, 'version_conflicts': 0, 'noops': 0, 'retries': {'bulk': 0, 'search': 0}, 'throttled_millis': 0, 'requests_per_second': -1.0, 'throttled_until_millis': 0, 'failures': [{'index': 'restored_idx', 'type': '_doc', 'id': 'AVeRNIkBgj9Doa2aiWL2', 'cause': {'type': 'illegal_argument_exception', 'reason': 'pipeline with id [write_redirect_pipeline] does not exist'}, 'status': 400}, {'index': 'restored_idx', 'type': '_doc', 'id': 'AleRNIkBgj9Doa2aimIj', 'cause': {'type': 'illegal_argument_exception', 'reason': 'pipeline with id [write_redirect_pipeline] does not exist'}, 'status': 400}, {'index': 'restored_idx', 'type': '_doc', 'id': 'A1eRNIkBgj9Doa2aimIx', 'cause': {'type': 'illegal_argument_exception', 'reason': 'pipeline with id [write_redirect_pipeline] does not exist'}, 'status': 400}, {'index': 'restored_idx', 'type': '_doc', 'id': 'BFeRNIkBgj9Doa2aimI-', 'cause': {'type': 'illegal_argument_exception', 'reason': 'pipeline with id [write_redirect_pipeline] does not exist'}, 'status': 400}, {'index': 'restored_idx', 'type': '_doc', 'id': 'BVeRNIkBgj9Doa2aimJO', 'cause': {'type': 'illegal_argument_exception', 'reason': 'pipeline with id [write_redirect_pipeline] does not exist'}, 'status': 400}, {'index': 'restored_idx', 'type': '_doc', 'id': 'BleRNIkBgj9Doa2aimJd', 'cause': {'type': 'illegal_argument_exception', 'reason': 'pipeline with id [write_redirect_pipeline] does not exist'}, 'status': 400}, {'index': 'restored_idx', 'type': '_doc', 'id': 'B1eRNIkBgj9Doa2aimJn', 'cause': {'type': 'illegal_argument_exception', 'reason': 'pipeline with id [write_redirect_pipeline] does not exist'}, 'status': 400}, {'index': 'restored_idx', 'type': '_doc', 'id': 'CFeRNIkBgj9Doa2aimJ1', 'cause': {'type': 'illegal_argument_exception', 'reason': 'pipeline with id [write_redirect_pipeline] does not exist'}, 'status': 400}, {'index': 'restored_idx', 'type': '_doc', 'id': 'CVeRNIkBgj9Doa2aimKA', 'cause': {'type': 'illegal_argument_exception', 'reason': 'pipeline with id [write_redirect_pipeline] does not exist'}, 'status': 400}, {'index': 'restored_idx', 'type': '_doc', 'id': 'CleRNIkBgj9Doa2aimKO', 'cause': {'type': 'illegal_argument_exception', 'reason': 'pipeline with id [write_redirect_pipeline] does not exist'}, 'status': 400}]}")

Reindexing is working when I am using any other new index than this restored_idx. Am I missing something here ?

Aditya_Teltia · July 9, 2023, 3:55pm

This is the cause of the error.

stephenb · July 9, 2023, 4:30pm

It's because you've set a default_pipeline in the index settings so any document indexed will look for that pipeline... I suspect that pipeline does not exist in your cluster2.

I would suggest that you try to run the reindex commands directly from the Kibana Dev tools and make sure they work and then try them in python

Aditya_Teltia · July 9, 2023, 4:58pm

Aditya_Teltia:

source_client.indices.put_settings(
        index=source_index,
        body={
            "index": {
                "default_pipeline": pipeline_name
            }
        }
    )

I have set the default_pipeline setting for source_index only which I am not even reindexing. Also it is getting reindexed to any other new index in cluster2 but not to an already existing index.

Aditya_Teltia · July 9, 2023, 4:58pm

Yes I initially tried on kibana only but got the same error

stephenb · July 9, 2023, 6:09pm

what would help ... if you want help is to show all the complete commands and their complete results from the Kibana Dev Tools ... I / we can't help much with just pieces of information.

Full example... then perhaps we can diagnose...

Aditya_Teltia · July 9, 2023, 6:45pm

Created an index my-idx-09-2022

Then I created a pipeline write-pipeline for indexing and associated it with my initial index my-idx-09-2022 with value my-idx-new-09-2023. So that any query go to this new index.

I snapshot this index and restored it to restored_index

These are the documents in restored index after snapshot and restore.

{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "restored_index",
        "_type" : "_doc",
        "_id" : "jf_aO4kBNxe2IgaAbnJp",
        "_score" : 1.0,
        "_source" : {
          "name" : "aditya"
        }
      },
      {
        "_index" : "restored_index",
        "_type" : "_doc",
        "_id" : "jv_aO4kBNxe2IgaAkXKn",
        "_score" : 1.0,
        "_source" : {
          "name" : "John Doe"
        }
      }
    ]
  }
}

These are the documents in my-idx-new-09-2023

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "my-idx-new-09-2023",
        "_type" : "_doc",
        "_id" : "j__jO4kBNxe2IgaAunJT",
        "_score" : 1.0,
        "_source" : {
          "name" : "John Doe2"
        }
      },
      {
        "_index" : "my-idx-new-09-2023",
        "_type" : "_doc",
        "_id" : "kf_oO4kBNxe2IgaAcnIn",
        "_score" : 1.0,
        "_source" : {
          "name" : "John Doe3"
        }
      }
    ]
  }
}

I do reindexing

It is not showing failure, I guess because I doing this within same cluster but the documents of my-idx-new-09-2023 are still not present in restored_idx. It is still the same.

stephenb · July 9, 2023, 6:59pm

Thanks!!!

Apologies for being picky but screenshots are very hard to work with as I can not test them on my end without trying to type them in which is error-prone... it is better to past the text command and the text output .... much better to help with.

and I am still confused about whether you have 2 clusters or 1... and where the error is happening

Also that last screenshot shows "updated" : 2 ... that is most likely because you have the same document ids in the source and destination for the reindex... so it is updating the documents not creating new documents... what is your expected results? If both the source and destination indices are from the same source and then restored they would have the same _ids... but your example shows different.... I suspect 1 is from 1 cluster and the other is from the other..

And if you can not reproduce the error I am not sure how I can?

In short where ever you are running this... make sure the ingest pipeline is there... that error is very clear that that ingest pipeline does not exist on the target cluster ...

You also did not show the index setting for all indices... which may be same or different.

Also if you are using a new version there is a reroute processor which would be the more correct way to do index re-routing

stephenb · July 9, 2023, 7:20pm

Note the same _id in the source and restored versions of the index.

So if you reindex into the restored with the same ids it will update not create new entries

And this is how to post examples ... also be clear about which of the clusters you are operating in.

POST discuss-test-source/_doc
{
  "foo" : "bar"
}

PUT _snapshot/found-snapshots/discuss-test-snap
{
  "indices": "discuss-test-source",
  "include_global_state": false
}

POST _snapshot/found-snapshots/discuss-test-snap/_restore
{
  "indices": "discuss-test-source",
  "rename_pattern": "discuss-test-(.+)",
  "rename_replacement": "restored-discuss-test-$1"
}

GET restored-discuss-test-source/_search

# GET restored-discuss-test-source/_search 200 OK
{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "restored-discuss-test-source",
        "_id": "sEQOPIkBFT0mnAMUW-PM",
        "_score": 1,
        "_source": {
          "foo": "bar"
        }
      }
    ]
  }
}

GET discuss-test-source/_search
# GET discuss-test-source/_search 200 OK
{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "discuss-test-source",
        "_id": "sEQOPIkBFT0mnAMUW-PM",
        "_score": 1,
        "_source": {
          "foo": "bar"
        }
      }
    ]
  }
}

Aditya_Teltia · July 9, 2023, 7:29pm

Thank you for responding. Okay let me recreate everything order-wise.

Creating an index in cluster1 my-idx-09-2022

Query:

POST my-idx-09-2022/_doc
{
  "name" : "Aditya"
}

Output:

{
  "_index" : "my-idx-09-2022",
  "_type" : "_doc",
  "_id" : "7rEUPIkBgCbkoQPXAwjV",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 0,
  "_primary_term" : 1
}

Create a ingest pipeline and update it in the setting of my-idx-09-2022

Query:

PUT _ingest/pipeline/write-pipeline
{
  "processors" : [
    {
      "set" : {
        "field": "_index",
        "value": "my-idx-new-09-2023"
      }
    }
  ]
}


PUT my-idx-09-2022/_settings
{
  "default_pipeline": "write-pipeline"
}

Output:

{
  "acknowledged" : true
}

Snapshot and restore my-idx-09-2022 to cluster2 index restored_index

Query

PUT /_snapshot/s3testing/snap1
{
  "indices":"my-idx-09-2022",
  "ignore_unavailable":true,
  "include_global_state":false
}

POST /_snapshot/s3testing/snap1/_restore?wait_for_completion=true
{
  "indices":"my-idx-09-2022",
  "rename_pattern":"my-idx-09-2022",
  "rename_replacement":"restored_index"
}

I don't know how to use kibana for multiple hosts thus I did it using postman.

http://localhost:9201/_snapshot/s3testing/snap1/_restore?wait_for_completion=true
{
  "indices":"my-idx-09-2022",
  "rename_pattern":"my-idx-09-2022",
  "rename_replacement":"restored_index"
}

Output:

{
  "accepted" : true
}

{
    "snapshot": {
        "snapshot": "snap1",
        "indices": [
            "restored_index"
        ],
        "shards": {
            "total": 1,
            "failed": 0,
            "successful": 1
        }
    }
}

Adding more document to my-idx-09-2022 but due to ingest pipeline the writes will go to my-idx-new-09-2022.

Query:

POST my-idx-09-2022/_doc
{
  "name" : "John Doe"
}

Output:

{
  "_index" : "my-idx-new-09-2023",
  "_type" : "_doc",
  "_id" : "8LEcPIkBgCbkoQPXQwh3",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 0,
  "_primary_term" : 1
}

Added 5 more documents same way. The new index my-idx-new-09-2023 _search gives this output

{
  "took" : 298,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 5,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "my-idx-new-09-2023",
        "_type" : "_doc",
        "_id" : "8LEcPIkBgCbkoQPXQwh3",
        "_score" : 1.0,
        "_source" : {
          "name" : "John Doe"
        }
      },
      {
        "_index" : "my-idx-new-09-2023",
        "_type" : "_doc",
        "_id" : "8bEdPIkBgCbkoQPXQwiD",
        "_score" : 1.0,
        "_source" : {
          "name" : "John Doe2"
        }
      },
      {
        "_index" : "my-idx-new-09-2023",
        "_type" : "_doc",
        "_id" : "8rEdPIkBgCbkoQPXTgjN",
        "_score" : 1.0,
        "_source" : {
          "name" : "John Doe3"
        }
      },
      {
        "_index" : "my-idx-new-09-2023",
        "_type" : "_doc",
        "_id" : "87EdPIkBgCbkoQPXWAhd",
        "_score" : 1.0,
        "_source" : {
          "name" : "John Doe4"
        }
      },
      {
        "_index" : "my-idx-new-09-2023",
        "_type" : "_doc",
        "_id" : "9LEdPIkBgCbkoQPXZAj9",
        "_score" : 1.0,
        "_source" : {
          "name" : "John Doe5"
        }
      }
    ]
  }
}

Now I reindex this new index my-idx-new-09-2023 to my already existing index in cluster2 i.e restored_index

Query:

http://localhost:9201/_reindex?wait_for_completion=true
{
    "source": {
        "remote": {
            "host": "http://localhost:9200"
        },
        "index": "my-idx-new-09-2023"
    },
    "dest": {
        "index": "restored_index"
    }
}

Output:

{
    "took": 19,
    "timed_out": false,
    "total": 5,
    "updated": 0,
    "created": 0,
    "deleted": 0,
    "batches": 1,
    "version_conflicts": 0,
    "noops": 0,
    "retries": {
        "bulk": 0,
        "search": 0
    },
    "throttled_millis": 0,
    "requests_per_second": -1.0,
    "throttled_until_millis": 0,
    "failures": [
        {
            "index": "restored_index",
            "type": "_doc",
            "id": "8LEcPIkBgCbkoQPXQwh3",
            "cause": {
                "type": "illegal_argument_exception",
                "reason": "pipeline with id [write-pipeline] does not exist"
            },
            "status": 400
        },
        {
            "index": "restored_index",
            "type": "_doc",
            "id": "8bEdPIkBgCbkoQPXQwiD",
            "cause": {
                "type": "illegal_argument_exception",
                "reason": "pipeline with id [write-pipeline] does not exist"
            },
            "status": 400
        },
        {
            "index": "restored_index",
            "type": "_doc",
            "id": "8rEdPIkBgCbkoQPXTgjN",
            "cause": {
                "type": "illegal_argument_exception",
                "reason": "pipeline with id [write-pipeline] does not exist"
            },
            "status": 400
        },
        {
            "index": "restored_index",
            "type": "_doc",
            "id": "87EdPIkBgCbkoQPXWAhd",
            "cause": {
                "type": "illegal_argument_exception",
                "reason": "pipeline with id [write-pipeline] does not exist"
            },
            "status": 400
        },
        {
            "index": "restored_index",
            "type": "_doc",
            "id": "9LEdPIkBgCbkoQPXZAj9",
            "cause": {
                "type": "illegal_argument_exception",
                "reason": "pipeline with id [write-pipeline] does not exist"
            },
            "status": 400
        }
    ]
}

This is how you can recreate this error. Apologies for using postman in some queries.

Christian_Dahlqvist · July 9, 2023, 7:45pm

You need to create the pipeline in both cluster as it is referenced by the index you restored but not restored as part of that index.

Aditya_Teltia · July 9, 2023, 7:45pm

Settings of all the three indices involved

my-idx-09-2022

{
  "my-idx-09-2022" : {
    "settings" : {
      "index" : {
        "routing" : {
          "allocation" : {
            "include" : {
              "_tier_preference" : "data_content"
            }
          }
        },
        "number_of_shards" : "1",
        "provided_name" : "my-idx-09-2022",
        "default_pipeline" : "write-pipeline",
        "creation_date" : "1688930091820",
        "number_of_replicas" : "1",
        "uuid" : "KNapWcTaRQOFAuvwujkJsQ",
        "version" : {
          "created" : "7171099"
        }
      }
    }
  }
}

my-idx-new-09-2023

{
  "my-idx-new-09-2023" : {
    "settings" : {
      "index" : {
        "routing" : {
          "allocation" : {
            "include" : {
              "_tier_preference" : "data_content"
            }
          }
        },
        "number_of_shards" : "1",
        "provided_name" : "my-idx-new-09-2023",
        "creation_date" : "1688930632398",
        "number_of_replicas" : "1",
        "uuid" : "FXn1sM91QpKFWM4TiMZfYg",
        "version" : {
          "created" : "7171099"
        }
      }
    }
  }
}

restored_index

{
    "restored_idx": {
        "settings": {
            "index": {
                "routing": {
                    "allocation": {
                        "include": {
                            "_tier_preference": "data_content"
                        }
                    }
                },
                "number_of_shards": "1",
                "provided_name": "restored_idx",
                "creation_date": "1688930871828",
                "number_of_replicas": "1",
                "uuid": "f9Hf7J6HQMyxYml2NBnFcA",
                "version": {
                    "created": "7171099"
                }
            }
        }
    }
}

stephenb · July 9, 2023, 8:02pm

@Aditya_Teltia This ^^^ you need to create the ingest_pipeline on the other cluster as well.

You have this setting in your source

"default_pipeline" : "write-pipeline",

So on the remote cluster that pipeline does not exist... therefore it can not be found... therefore you getting the error

When I do a snap and restore with a default_pipeline setting that is brought to the restored index

PUT discuss-test-source/_settings
{
  "default_pipeline": "discuss-test-pipeline"
}

PUT _snapshot/found-snapshots/discuss-test-snap
{
  "indices": "discuss-test-source",
  "include_global_state": false
}

POST _snapshot/found-snapshots/discuss-test-snap/_restore
{
  "indices": "discuss-test-source",
  "rename_pattern": "discuss-test-(.+)",
  "rename_replacement": "restored-discuss-test-$1"
}

GET discuss-test-source

GET restored-discuss-test-source

# GET discuss-test-source 200 OK
{
  "discuss-test-source": {
    "aliases": {},
    "mappings": {
      "properties": {
        "foo": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        }
      }
    },
    "settings": {
      "index": {
        "routing": {
          "allocation": {
            "include": {
              "_tier_preference": "data_content"
            }
          }
        },
        "number_of_shards": "1",
        "provided_name": "discuss-test-source",
        "default_pipeline": "discuss-test-pipeline",
        "creation_date": "1688929720842",
        "number_of_replicas": "1",
        "uuid": "hWPBpD4PSn6nF9nxhTQrOg",
        "version": {
          "created": "8080199"
        }
      }
    }
  }
}
# GET restored-discuss-test-source 200 OK
{
  "restored-discuss-test-source": {
    "aliases": {},
    "mappings": {
      "properties": {
        "foo": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        }
      }
    },
    "settings": {
      "index": {
        "routing": {
          "allocation": {
            "include": {
              "_tier_preference": "data_content"
            }
          }
        },
        "number_of_shards": "1",
        "provided_name": "discuss-test-source",
        "default_pipeline": "discuss-test-pipeline",
        "creation_date": "1688929720842",
        "number_of_replicas": "1",
        "uuid": "GgVjSl8oT66ZSpH7wMFl9A",
        "version": {
          "created": "8080199"
        }
      }
    }
  }
}

Not sure why your restored does not show that ...

So if I try a reindex when the pipeline does not exist (I did this on a single cluster) I get the same error you are getting... so you need to put the pipeline on the remote cluster ...

pipeline does not exist but is defined as the default_pipeline

POST _reindex
{
  "source": {
    "index": "discuss-test-source"
  },
  "dest": {
    "index": "restored-discuss-test-source"
  }
}

# Result

{
  "took": 6,
  "timed_out": false,
  "total": 1,
  "updated": 0,
  "created": 0,
  "deleted": 0,
  "batches": 1,
  "version_conflicts": 0,
  "noops": 0,
  "retries": {
    "bulk": 0,
    "search": 0
  },
  "throttled_millis": 0,
  "requests_per_second": -1,
  "throttled_until_millis": 0,
  "failures": [
    {
      "index": "restored-discuss-test-source",
      "id": "sEQOPIkBFT0mnAMUW-PM",
      "cause": {
        "type": "illegal_argument_exception",
        "reason": "pipeline with id [discuss-test-pipeline] does not exist"
      },
      "status": 400
    }
  ]
}

and again just setting the _index is not a great practices ... it works but reroute processor is better

system · August 6, 2023, 8:03pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Trouble with reindexing and ingest pipeline Elasticsearch ingest-pipeline , reindex	4	369	November 3, 2022
How does we can implement ingest pipeline in runtime Elasticsearch ingest-pipeline	3	404	February 7, 2022
How to delete a field from an index while reindexing? Elasticsearch	7	13356	December 13, 2018
Ingest nodes and metadata Elasticsearch	3	548	July 4, 2017
Error reindex some data Elasticsearch ingest-pipeline , reindex	2	626	February 24, 2021

Reindexing an index which had document added by ingest pipeline

Related topics