Strange reindex behaviour

leandrojmp · May 26, 2021, 6:38pm

I'm doing some reindex of daily index to monthly ones and sometimes the reindex process enters in a loop and starts again after it already reindexed the documents.

For example, considering the index indexName-2021.05.01 with 500.000 documents and reindexing it into indexName-2021.05, I start the reindex process with the following request:

POST _reindex
{
  "source": {
    "index": "indexName-2021.05.01"
  },
  "dest": {
    "index": "indexName-2021.05"
  }
}

Then I use GET _tasks?actions=*reindex&detailed to get the task id and GET _tasks/taskID to monitor the progress.

I can see the created number increasing as expected.

          "status" : {
            "total" : 500000,
            "updated" : 0,
            "created" : 184000,
            "deleted" : 0,
            "batches" : 184,
            "version_conflicts" : 0,
            "noops" : 0,
            "retries" : {
              "bulk" : 0,
              "search" : 0
            }

I also use the Discover in Kibana to compare the documents, in one tab I filter by the daily index and in the other i filter by the monthly index, both using the same time interval, which correspond the the time interval present in the daily index.

After sometime I check again for the progress using the task id and got an answer saying that the task doesn't exist anymore, from discover I can see that all documents were reindexed as expected.

But if I run GET _tasks?actions=*reindex&detailed to get the tasks running I can see few tasks doing reindex for the same index, but now they show the updated part of the status response increasing.

          "status" : {
            "total" : 500000,
            "updated" : 175000,
            "created" : 0,
            "deleted" : 0,
            "batches" : 175,
            "version_conflicts" : 0,
            "noops" : 0,
            "retries" : {
              "bulk" : 0,
              "search" : 0
            }

My question is: What can cause this to happen since the documents were already indexed?

Sometimes this reindex loop takes a long time, which makes the load and CPU of the nodes doing the reindex process to increase a lot.

Normally when this happens and I can confirm that all the documentes were reindexed, I cancel all the reindex tasks using:

POST _tasks/_cancel?actions=*reindex

But I would like to know why this is happening and how to avoid it.

I'm currently running version 7.9.3, an upgrade to 7.12.1 is planned for the next weeks.

system · June 23, 2021, 6:38pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Reindex task completed but no response was returned Elasticsearch	5	1655	December 16, 2018
Reindexing creates frivolous tasks (ES 2.3.3,. 2.4) Elasticsearch	3	692	July 5, 2017
Why does reindex cause Updates? Elasticsearch	4	484	July 5, 2018
Elasticsearch duplicate reindex tasks Elasticsearch	11	3490	August 16, 2017
Checking ReIndex Status - Seems wrong, why? Kibana	2	308	December 14, 2020

Strange reindex behaviour

Related topics