Merge multiple indices in one

Hello,
We have an ES cluster that stores daily indexes, like elb-2020.01.09 logs.
Apart from Index Life Cycle, is there are other ways to merge all elb-2020.01.* indexes into elb-2020.01 index and then delete all the elb-2020.01.* indexes?

I have the following painless script:

POST _reindex?requests_per_second=115&wait_for_completion=true
{
  "source": {
    "index": "elb-2020.01.*",
    "size": 500
  },
  "dest": {
    "index": "elb-2020.01"
  },
  "script": {
    "lang": "painless",   
    "source": """
      ctx._source.index = ctx._index;
      def eventData = ctx._source["event.data"];
      if (eventData != null) {
        eventData.remove("realmDb.size");
        eventData.remove("realmDb.format");
        eventData.remove("realmDb.contents");
      }
    """
  }
}

This works but looking at https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html#docs-reindex-many-indices it is recommended to re-index them one-by-one.

Also I do get some errors, like:

{
  "took": 8100,
  "timed_out": false,
  "total": 42798,
  "updated": 0,
  "created": 498,
  "deleted": 0,
  "batches": 1,
  "version_conflicts": 0,
  "noops": 0,
  "retries": {
    "bulk": 0,
    "search": 0
  },
  "throttled_millis": 0,
  "requests_per_second": 115,
  "throttled_until_millis": 0,
  "failures": [
    {
      "index": "elb-2020.01",
      "type": "_doc",
      "id": "hy3xvGgBJbHuicWHFn_C",
      "cause": {
        "type": "illegal_argument_exception",
        "reason": "mapper [network.latency] cannot be changed from type [float] to [long]"
      },
      "status": 400
    },
    {
      "index": "elb-2020.01",
      "type": "_doc",
      "id": "9S6ex2gBJbHuicWHadlX",
      "cause": {
        "type": "illegal_argument_exception",
        "reason": "mapper [network.latency] cannot be changed from type [float] to [long]"
      },
      "status": 400
    }
  ]
}
  • Is this a simple way to do this?
  • How do I deal with errors?
  • For each daily index should I disable replica before merging into the monthly index?
  • Anything else I would need to consider?

Any advice is much appreciated.

Hey

quick answer to your last four questions

  1. As you figured out: The reindex docs state the easiest way to do this, by using reindex with a unique target index
  2. That is one of the reasons that per-index reindex is suggested in the docs. You need to take a look at each of the failures individually. Often it is a mismapped field, that can potentially be fixed with a small addition to the script like casting a variable. There is no one-size-fits-all solution though.
  3. That does not matter in this case. You can leave as is. You might have faster indexing speed if the new big index has no replicas while indexing and after you are done you can increase that setting

hope that helps.

One more thing in general: If this is time based data and it will age out over time or get removed, why not using the shrink API for those indices that are affected and then just have less indices in the future? Or will this data be around for a long time and therefore you are optimizing here?

i have just taken over an ES cluster and my first step is to optimize as the ES cluster has too many old indexes currently at 2715 with 27118 shards ranging from 100.1kb to 1gb, so the cluster has gone into a YELLOW state and I need to fix this.

But thanks for the advice

1 Like
  1. in painless is there a way to get all the indexes with a specific regex and loop over that or is bash script simpler?
  2. When creating the source index, how do I specify the replica size?

I'd go with a bash script.

Regarding your other point, you should create the index and the mapping upfront before indexing into that index. And that's also where you specify the number of replicas.

great thanks

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.