Which is more efficient, to reindex the raw .ndjson or use the _reindex api

Norman_Khine · March 10, 2020, 10:18am

Hello,
Currently we backup our daily indexes to AWS S3 as raw .ndjosn format and I wanted an opinion whether it would be better to re-ingest these files into a monthly index or use the _reindex API to this.

So my options would be:

to get all the indexes that have last months dates, load into Firehose and then ingest into index-YYYY.MM index
use the reindex API like:

POST _reindex?requests_per_second=115&wait_for_completion=true
{
  "source": {
    "index": "index-YYYY.MM.*",
    "size": 1000
  },
  "dest": {
    "index": "index-YYYY.MM"
  },
  "conflicts": "proceed", 
  "script": {
    "lang": "painless",   
    "source": """
      ctx._source.index = ctx._index;
    """
  }
}

apart from complexity, additional moving parts and data transfer is there any value to using the first option?

any advice is much appreciated

spinscale · March 10, 2020, 1:19pm

Hey,

from my perspective reindex sounds good. The only difference is (leaving complexity of having a loader component aside), that with reindex you will a slightly higher load on the Elasticsearch due to retrieving those documents before indexing them again, doing both steps (loading and indexing) on the ES cluster side.

Not a deal breaker from my perspective though.

--Alex

system · April 7, 2020, 1:19pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Reindex performance Elasticsearch reindex	2	172	May 15, 2024
Reindexing all data or Reindexing only changes Elasticsearch reindex	4	765	October 20, 2023
Logstash or re-index API to chose for re-indexing? Logstash	1	276	February 14, 2020
Avoid reinventing a reindexing toolbox Elasticsearch	1	218	August 7, 2022
How to reindex ElasticSearch quickly? Elasticsearch	14	4340	July 6, 2017

Which is more efficient, to reindex the raw .ndjson or use the _reindex api

Related topics