Which is more efficient, to reindex the raw .ndjson or use the _reindex api

Hello,
Currently we backup our daily indexes to AWS S3 as raw .ndjosn format and I wanted an opinion whether it would be better to re-ingest these files into a monthly index or use the _reindex API to this.

So my options would be:

  • to get all the indexes that have last months dates, load into Firehose and then ingest into index-YYYY.MM index

  • use the reindex API like:

POST _reindex?requests_per_second=115&wait_for_completion=true
{
  "source": {
    "index": "index-YYYY.MM.*",
    "size": 1000
  },
  "dest": {
    "index": "index-YYYY.MM"
  },
  "conflicts": "proceed", 
  "script": {
    "lang": "painless",   
    "source": """
      ctx._source.index = ctx._index;
    """
  }
}

apart from complexity, additional moving parts and data transfer is there any value to using the first option?

any advice is much appreciated

Hey,

from my perspective reindex sounds good. The only difference is (leaving complexity of having a loader component aside), that with reindex you will a slightly higher load on the Elasticsearch due to retrieving those documents before indexing them again, doing both steps (loading and indexing) on the ES cluster side.

Not a deal breaker from my perspective though.

--Alex

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.