Merge multiple indices in one

Norman_Khine · February 23, 2020, 11:41am

Hello,
We have an ES cluster that stores daily indexes, like elb-2020.01.09 logs.
Apart from Index Life Cycle, is there are other ways to merge all elb-2020.01.* indexes into elb-2020.01 index and then delete all the elb-2020.01.* indexes?

I have the following painless script:

POST _reindex?requests_per_second=115&wait_for_completion=true
{
  "source": {
    "index": "elb-2020.01.*",
    "size": 500
  },
  "dest": {
    "index": "elb-2020.01"
  },
  "script": {
    "lang": "painless",   
    "source": """
      ctx._source.index = ctx._index;
      def eventData = ctx._source["event.data"];
      if (eventData != null) {
        eventData.remove("realmDb.size");
        eventData.remove("realmDb.format");
        eventData.remove("realmDb.contents");
      }
    """
  }
}

This works but looking at https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html#docs-reindex-many-indices it is recommended to re-index them one-by-one.

Also I do get some errors, like:

{
  "took": 8100,
  "timed_out": false,
  "total": 42798,
  "updated": 0,
  "created": 498,
  "deleted": 0,
  "batches": 1,
  "version_conflicts": 0,
  "noops": 0,
  "retries": {
    "bulk": 0,
    "search": 0
  },
  "throttled_millis": 0,
  "requests_per_second": 115,
  "throttled_until_millis": 0,
  "failures": [
    {
      "index": "elb-2020.01",
      "type": "_doc",
      "id": "hy3xvGgBJbHuicWHFn_C",
      "cause": {
        "type": "illegal_argument_exception",
        "reason": "mapper [network.latency] cannot be changed from type [float] to [long]"
      },
      "status": 400
    },
    {
      "index": "elb-2020.01",
      "type": "_doc",
      "id": "9S6ex2gBJbHuicWHadlX",
      "cause": {
        "type": "illegal_argument_exception",
        "reason": "mapper [network.latency] cannot be changed from type [float] to [long]"
      },
      "status": 400
    }
  ]
}

Is this a simple way to do this?
How do I deal with errors?
For each daily index should I disable replica before merging into the monthly index?
Anything else I would need to consider?

Any advice is much appreciated.

spinscale · February 24, 2020, 9:42am

Hey

quick answer to your last four questions

As you figured out: The reindex docs state the easiest way to do this, by using reindex with a unique target index
That is one of the reasons that per-index reindex is suggested in the docs. You need to take a look at each of the failures individually. Often it is a mismapped field, that can potentially be fixed with a small addition to the script like casting a variable. There is no one-size-fits-all solution though.
That does not matter in this case. You can leave as is. You might have faster indexing speed if the new big index has no replicas while indexing and after you are done you can increase that setting

hope that helps.

spinscale · February 24, 2020, 9:44am

One more thing in general: If this is time based data and it will age out over time or get removed, why not using the shrink API for those indices that are affected and then just have less indices in the future? Or will this data be around for a long time and therefore you are optimizing here?

Norman_Khine · February 24, 2020, 10:07am

i have just taken over an ES cluster and my first step is to optimize as the ES cluster has too many old indexes currently at 2715 with 27118 shards ranging from 100.1kb to 1gb, so the cluster has gone into a YELLOW state and I need to fix this.

But thanks for the advice

Norman_Khine · February 24, 2020, 10:19am

in painless is there a way to get all the indexes with a specific regex and loop over that or is bash script simpler?
When creating the source index, how do I specify the replica size?

spinscale · February 24, 2020, 11:04am

I'd go with a bash script.

Regarding your other point, you should create the index and the mapping upfront before indexing into that index. And that's also where you specify the number of replicas.

Norman_Khine · February 24, 2020, 11:13am

great thanks

system · March 23, 2020, 11:13am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Merge daily indexes into monthly Elasticsearch	1	810	March 19, 2020
Is merging index A and Index B, into and index C, is possible in elasticsearch 6.x? Elasticsearch ilm-index-lifecycle-management	1	384	April 18, 2020
Reindexing multiple indexes into a single one Elasticsearch	3	1187	May 8, 2019
How to merge two indices in different ES instance into one? Elasticsearch	3	1718	July 5, 2017
Merging Two Indexes Elasticsearch	3	9129	July 6, 2017

Merge multiple indices in one

Related topics