Reindex just one specific index in datastrem

Oleksandr_Isniuk · October 20, 2022, 7:58am

We changed field mappings in our template. After rollover new index file works with correct field types. We supposed that we can use smaller time ranges to be sure that all data was read fro the new file. But it is not enough. It is fields from newly introduced data. So we need just to reindex one previous index. Are there any ways to do so and avoid reindexing of whole Data Stream?

warkolm · October 23, 2022, 11:19pm

Welcome to our community!

If you know the names of the underlying indices you can run a manual reindex back into the datastream, and then delete the old indices.

Christian_Dahlqvist · October 24, 2022, 1:27am

If you reindex old data into the data stream, will that not mess up the retention period for that data?

warkolm · October 24, 2022, 1:50am

Retention is based on index age, so it will change it a bit yes as a newer index will hold older data.

I guess it comes back to how important it is to have that data around and if they are able to hold it for a bit longer.

Oleksandr_Isniuk · October 27, 2022, 12:56pm

@warkolm thanks for the recommendation. I guess idea is to clone an index I need to reindex in some separate space. And reindex afterward directly into the data stream as new data but with old timestamps in log entries. Am I got you right?

warkolm · October 29, 2022, 5:36am

That still won't change the retention of the data, because the index itself will be newer than the original timestamps.

Oleksandr_Isniuk · November 4, 2022, 12:02pm

@warkolm thanks. It mostly works for me. The previous try was on a testing data stream. And I tried to reindex opened index. That's why I supposed a cloning step. But I encountered an error.

{
  "completed" : true,
  "task" : {
    "node" : "my-node-id",
    "id" : 31873238,
    "type" : "transport",
    "action" : "indices:data/write/reindex",
    "status" : {
      "total" : 91856040,
      "updated" : 0,
      "created" : 45873998,
      "deleted" : 0,
      "batches" : 45874,
      "version_conflicts" : 0,
      "noops" : 0,
      "retries" : {
        "bulk" : 0,
        "search" : 0
      },
      "throttled_millis" : 0,
      "requests_per_second" : -1.0,
      "throttled_until_millis" : 0
    },
    "description" : "reindex from [.myindex] to [mydatastream][_doc]",
    "start_time_in_millis" : 1667470198137,
    "running_time_in_nanos" : 74895189349592,
    "cancellable" : true,
    "cancelled" : false,
    "headers" : { }
  },
  "response" : {
    "took" : 74895188,
    "timed_out" : false,
    "total" : 91856040,
    "updated" : 0,
    "created" : 45873998,
    "deleted" : 0,
    "batches" : 45874,
    "version_conflicts" : 0,
    "noops" : 0,
    "retries" : {
      "bulk" : 0,
      "search" : 0
    },
    "throttled" : "0s",
    "throttled_millis" : 0,
    "requests_per_second" : -1.0,
    "throttled_until" : "0s",
    "throttled_until_millis" : 0,
    "failures" : [
      {
        "index" : ".newindex",
        "type" : "_doc",
        "id" : "mydocid",
        "cause" : {
          "type" : "mapper_parsing_exception",
          "reason" : "failed to parse",
          "caused_by" : {
            "type" : "illegal_argument_exception",
            "reason" : "Limit of total fields [1000] has been exceeded while adding new fields [2]"
          }
        },
        "status" : 400
      },
      {
        "index" : ".myindex",
        "type" : "_doc",
        "id" : "mydocid",
        "cause" : {
          "type" : "mapper_parsing_exception",
          "reason" : "failed to parse",
          "caused_by" : {
            "type" : "illegal_argument_exception",
            "reason" : "Limit of total fields [1000] has been exceeded while adding new fields [2]"
          }
        },
        "status" : 400
      }
    ]
  }
}

Is there any chance to continue on error and just log them? I reindexed just about 180Gb of 280Gb index. And this was not the first error. I previously deleted old data from new indexes(evaluated as duplicated), made some changes to field mappings and started new reindex task.

warkolm · November 7, 2022, 3:19am

Your best option there would be to increase the mapping limit for the index so you can you complete the reindex. Otherwise you could try ignore_malformed | Elasticsearch Guide [8.5] | Elastic.

Oleksandr_Isniuk · November 10, 2022, 3:58pm

@warkolm thank you. You helped a lot.

system · December 8, 2022, 3:59pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
The proper way to reindex a data stream index Elasticsearch reindex , datastreams	1	37	November 27, 2024
Reindexing a datastream after changed mappings : best practice? Elasticsearch reindex , datastreams	0	45	November 10, 2024
Reindex part of a datastream Elasticsearch	0	93	April 15, 2024
Change field type, don't need to reindex Elasticsearch	2	22	August 29, 2024
Reindex data stream Elasticsearch	2	331	July 28, 2023

Reindex just one specific index in datastrem

Related topics