What's the recommended strategy to reindex all (over 500) indices from 5.6.3 to 7.x?

I have to migrate from ES5.6.3 to latest ES 7.x There are over 500 indices, 10 flavor of index, high number of index because of rollover ex index1-yyyy.mm and so on. The two options can be -

  1. scripting - ex bash script connecting to new cluster
    read all indexes from a file (this is done separately by connecting to 5.x cluster)
    loop
    create mapping
    reindex from remote

Question- can index of same flavor be reindexed api in a single command with source and dest index as "index1-*"

  1. same logic in a java program using high level REST client

Since the available servers to work on is Windows, I am having difficulty setting up bash script, If there is a way to do it in one shot for indexes of same flavor I would like that since it is a one time job.

Not sure I fully understand but did you look at the reindex API?

Reindex API supports reindexing from a remote Elasticsearch cluster:

POST _reindex
{
"source": {
"remote": {
  "host": "http://otherhost:9200",
  "username": "user",
  "password": "pass"
},
"index": "source",
"query": {
  "match": {
    "test": "data"
  }
}
},
"dest": {
"index": "dest"
}
}

I guess also you can Reindex multiple indices with a wildcard, check this link https://www.elastic.co/guide/en/elasticsearch/reference/7.2/docs-reindex.html#_reindex_daily_indices

You can also try Curator

Thanks @ylasri and @dadoonet . I tried reindex using wildcard with painless script. I have many indices Ex: audit-2018.01, audit-2018.02 and so on and want to create in another instance the indices by the same name. What's the correct way to reference remote index in the script?
POST _reindex
{
"conflicts": "proceed",
"source": {
"remote": {
"host": "http://localhost:9200"
},
"index": "audit*",
"size": 1000
},
"dest": {
"index": "audit",
"op_type": "create"
},
"script": {
"lang": "painless",
"source": "ctx._index = 'audit' + (remote._index.substring('audit'.length(), remote._index.length()))
}
}

try this with 1 small indice

 POST _reindex {
    "conflicts": "proceed",
    "source": {
    "remote": {
    "host": "http://localhost:9200",
      "username": "elastic",
      "password": "changeme"
    },
    "index": "audit-*",
    "size": 1
    },
    "dest": {
    "index": "audit",
    "op_type": "create"
    },
    "script": {
    "lang": "painless",
        "source": "ctx._index = 'audit-' + (ctx._index.substring('audit-'.length(), ctx._index.length()))"
    }
    }

add this setup in remote cluster elasticsearch.yml file and restart the cluster

reindex.remote.whitelist: localhost:9200

@ylasri thank you, it does create multiple indexes. An interesting observation though (and my issue remains)
POST _reindex
{
"conflicts": "proceed",
"source": {
"remote": {
"host": "http://localhost:9200"
},
"index": "audit-2010.*",
"size": 100
},
"dest": {
"index": "audit-2010",
"op_type": "create"
},
"script": {
"lang": "painless",
"source": "ctx._index = 'audit-2010-' + (ctx._index.substring('audit-2010.'.length(), ctx._index.length()))"
}
}
creates multiple index audit-2010-01, audit-2010-02 but the below creates just one index . (the only difference is dot instead of dash since my original index are like audit-2010.01, audit-2010.01 I want to keep the same index name in the new instance

POST _reindex
{
"conflicts": "proceed",
"source": {
"remote": {
"host": "http://localhost:9200"
},
"index": "audit-2010.*",
"size": 100
},
"dest": {
"index": "audit-2010",
"op_type": "create"
},
"script": {
"lang": "painless",
"source": "ctx._index = 'audit-2010.' + (ctx._index.substring('audit-2010.'.length(), ctx._index.length()))"
}
}

Any reason why?

Can you share the indice name created by the second script?

@yalsri the index created is audit-2010

Try this pattern "index": "audit-2010*" instead of "index": "audit-2010.*" may be the issue is with the special dot :slight_smile:

@yalsri it doesn't matter the selection by "index":"audit-2010*" . The question is how to escape the dot in painless. Escaping with \. gives error
"source": "ctx._index = 'audit-2010\.' + (ctx._index.substring('....

@dadoonet - David, this works in painless
"source": "ctx._index = (ctx._index.substring(1, ctx._index.length()))"

and this does not
"source": "ctx._index = (ctx._index.substring(0, ctx._index.length()))"

I do not get any syntax error but the default index name is created by reindex when using the later.

painfull painless :slight_smile:
Hi @nik9000 we need your help here if possible

@ylasri LOL - thanks for your quick responses.
@nik9000 your help is awaited.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.