Remote reindex with wildcard to multiple indices

Hey Everyone,

I'm having some issues with running a remote reindex from cluster a to cluster b. The end goal I'm trying to achieve is having some index pattern that the user can provide, and the remote reindex will get all the indices with that pattern from a remote cluster, and create them on the local cluster.

This should include having op_type set to "create" as it should be used to sync the clusters in case of some failure. I tried taking the script provided by elastic here: Reindex API | Elasticsearch Guide [8.5] | Elastic , and just removing the part where he adds a minus (-), but that ends up failing.

This is what the script looks like currently:

POST _reindex
{
  "source": {
    "remote": {
      "host": "https://remote_host:9200",
      "username": "elastic",
      "password": "SomePass"
    },
    "index": "source-*"
  },
  "dest": {
    "index": "source",
    "op_type": "create"
  },
  "script": {
    "lang": "painless",
    "source": "ctx._index = 'source-' + (ctx._index.substring('source-'.length(), ctx._index.length()))"
  }
}

One more thing that needs to be said, I'm not sure if creating missing indices on the destination will break ILM. Do the reindexed indices still contain the same metadata ILM uses to tell the indices apart... their age, their order, and so on?

I'm aware some people created scripts for this in bash and so on, but I would primarily like to know if there's some way to overcome this with Elastic and it's API purely. If not, I can write the logic used in the bash scripts myself in Ansible.

Thanks in advance for any help!

Failing how? It helps if you share the response from Elasticsearch, and any relevant logs.

Nope.

Sorry for not giving an example. There is no actual error, what ends up happening is all the data from the source indices get written to one destination index. In this example it would be "source".

Only when I make the index name different from the source do they get replicated semi-correctly. For example, by adding a minus at the end, or any other string. By semi-correctly I mean the data is correct, but of course the names of the indices are different, which is not acceptable in this situation.

Regarding the second question, is there any way to make ILM work in this situation as on the source cluster? If it's not possible in the basic license, would it be possible using CCR?

Thanks!

Right, cause that's what you told it to do :slight_smile:

There's not currently an easy way to index multiple source indices into multiple destination ones, it's a DIY process using a for loop or something in some external code.

If you reindexed into a write alias it will use the ILM policy, but it will treat the data as new and not factor in existing ILM settings on those indices.

CCR might work though.

Have you considered using the snapshot and restore APIs? These retain the index settings, but do copy the indices axactly as they are and do not allow you to change mappings, which you can do when reindexing.

Yeah I figured snapshot and restore would be my next best bet. Just wanted to confirm if there's a simpler way before doing that.

The potential issue with a snapshot and restore would be the time it takes to complete. The cluster is quite large with over 2tb ingested daily. I'll see if I can make it work. Thank you for the help Mark, and Christian.

Cheers!

Using snapshot and restore will likely be faster and require less resources than reindexing.