Reindex just one specific index in datastrem

We changed field mappings in our template. After rollover new index file works with correct field types. We supposed that we can use smaller time ranges to be sure that all data was read fro the new file. But it is not enough. It is fields from newly introduced data. So we need just to reindex one previous index. Are there any ways to do so and avoid reindexing of whole Data Stream?

Welcome to our community! :smiley:

If you know the names of the underlying indices you can run a manual reindex back into the datastream, and then delete the old indices.

If you reindex old data into the data stream, will that not mess up the retention period for that data?

Retention is based on index age, so it will change it a bit yes as a newer index will hold older data.

I guess it comes back to how important it is to have that data around and if they are able to hold it for a bit longer.

@warkolm thanks for the recommendation. I guess idea is to clone an index I need to reindex in some separate space. And reindex afterward directly into the data stream as new data but with old timestamps in log entries. Am I got you right?

That still won't change the retention of the data, because the index itself will be newer than the original timestamps.

@warkolm thanks. It mostly works for me. The previous try was on a testing data stream. And I tried to reindex opened index. That's why I supposed a cloning step. But I encountered an error.

{
  "completed" : true,
  "task" : {
    "node" : "my-node-id",
    "id" : 31873238,
    "type" : "transport",
    "action" : "indices:data/write/reindex",
    "status" : {
      "total" : 91856040,
      "updated" : 0,
      "created" : 45873998,
      "deleted" : 0,
      "batches" : 45874,
      "version_conflicts" : 0,
      "noops" : 0,
      "retries" : {
        "bulk" : 0,
        "search" : 0
      },
      "throttled_millis" : 0,
      "requests_per_second" : -1.0,
      "throttled_until_millis" : 0
    },
    "description" : "reindex from [.myindex] to [mydatastream][_doc]",
    "start_time_in_millis" : 1667470198137,
    "running_time_in_nanos" : 74895189349592,
    "cancellable" : true,
    "cancelled" : false,
    "headers" : { }
  },
  "response" : {
    "took" : 74895188,
    "timed_out" : false,
    "total" : 91856040,
    "updated" : 0,
    "created" : 45873998,
    "deleted" : 0,
    "batches" : 45874,
    "version_conflicts" : 0,
    "noops" : 0,
    "retries" : {
      "bulk" : 0,
      "search" : 0
    },
    "throttled" : "0s",
    "throttled_millis" : 0,
    "requests_per_second" : -1.0,
    "throttled_until" : "0s",
    "throttled_until_millis" : 0,
    "failures" : [
      {
        "index" : ".newindex",
        "type" : "_doc",
        "id" : "mydocid",
        "cause" : {
          "type" : "mapper_parsing_exception",
          "reason" : "failed to parse",
          "caused_by" : {
            "type" : "illegal_argument_exception",
            "reason" : "Limit of total fields [1000] has been exceeded while adding new fields [2]"
          }
        },
        "status" : 400
      },
      {
        "index" : ".myindex",
        "type" : "_doc",
        "id" : "mydocid",
        "cause" : {
          "type" : "mapper_parsing_exception",
          "reason" : "failed to parse",
          "caused_by" : {
            "type" : "illegal_argument_exception",
            "reason" : "Limit of total fields [1000] has been exceeded while adding new fields [2]"
          }
        },
        "status" : 400
      }
    ]
  }
}

Is there any chance to continue on error and just log them? I reindexed just about 180Gb of 280Gb index. And this was not the first error. I previously deleted old data from new indexes(evaluated as duplicated), made some changes to field mappings and started new reindex task.

Your best option there would be to increase the mapping limit for the index so you can you complete the reindex. Otherwise you could try ignore_malformed | Elasticsearch Guide [8.5] | Elastic.

@warkolm thank you. You helped a lot.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.