Elasticsearch duplicate reindex tasks

Viorel_Florian · July 4, 2017, 8:23am

I started a reindex task on a single node elasticsearch instance.
POST /_reindex { "source": { "index": "logstash-2017.06.29" }, "dest": { "index": "log-2017.06.29" } }
I left it to run overnight and in the morning the VM it was running on was reporting 100% CPU usage, constantly.
I executed the following call to see if the reindex has finished:
GET /_tasks?pretty&detailed&actions=*reindex
I found that elastic node was running 72 identical reindex tasks.
Any suggestions as to why this happened and how to fix it?

JKhondhu · July 4, 2017, 1:11pm

Hi,

Please share the output from:
GET _tasks?pretty&detailed&actions=*reindex which gave you the inclination of 72 identical tasks?

Also, If still running:
GET _tasks?detailed=true&actions=*reindex

Viorel_Florian · July 4, 2017, 1:29pm

Here is the output of "GET _tasks?detailed=true&actions=*reindex"
{
"nodes": {
"XG7fQBGKSoyRP1lThsf6Eg": {
"name": "Laa Laa",
"transport_address": "es_ip:9300",
"host": "es_ip",
"ip": "es_ip:9300",
"roles": [
"master",
"data",
"ingest"
],
"attributes": {
"ml.enabled": "true"
},
"tasks": {
"XG7fQBGKSoyRP1lThsf6Eg:33268730": {
"node": "XG7fQBGKSoyRP1lThsf6Eg",
"id": 33268730,
"type": "transport",
"action": "indices:data/write/reindex",
"status": {
"total": 8736065,
"updated": 2681000,
"created": 0,
"deleted": 0,
"batches": 2682,
"version_conflicts": 0,
"noops": 0,
"retries": {
"bulk": 4081,
"search": 0
},
"throttled_millis": 0,
"requests_per_second": -1,
"throttled_until_millis": 0
},
"description": "reindex from [logstash-2017.06.29] to [log-2017.06.29]",
"start_time_in_millis": 1499134163592,
"running_time_in_nanos": 19155089098824,
"cancellable": true
},
"XG7fQBGKSoyRP1lThsf6Eg:32601074": {
"node": "XG7fQBGKSoyRP1lThsf6Eg",
"id": 32601074,
"type": "transport",
"action": "indices:data/write/reindex",
"status": {
"total": 8736065,
"updated": 2888000,
"created": 0,
"deleted": 0,
"batches": 2889,
"version_conflicts": 0,
"noops": 0,
"retries": {
"bulk": 4425,
"search": 0
},
"throttled_millis": 0,
"requests_per_second": -1,
"throttled_until_millis": 0
},
"description": "reindex from [logstash-2017.06.29] to [log-2017.06.29]",
"start_time_in_millis": 1499132363415,
"running_time_in_nanos": 20955188713161,
"cancellable": true
},
"XG7fQBGKSoyRP1lThsf6Eg:33493490": {
"node": "XG7fQBGKSoyRP1lThsf6Eg",
"id": 33493490,
"type": "transport",
"action": "indices:data/write/reindex",
"status": {
"total": 8736065,
"updated": 2588000,
"created": 0,
"deleted": 0,
"batches": 2589,
"version_conflicts": 0,
"noops": 0,
"retries": {
"bulk": 3893,
"search": 0
},
"throttled_millis": 0,
"requests_per_second": -1,
"throttled_until_millis": 0
},
"description": "reindex from [logstash-2017.06.29] to [log-2017.06.29]",
"start_time_in_millis": 1499134763958,
"running_time_in_nanos": 18554748489822,
"cancellable": true
},
"XG7fQBGKSoyRP1lThsf6Eg:33092599": {
"node": "XG7fQBGKSoyRP1lThsf6Eg",
"id": 33092599,
"type": "transport",
"action": "indices:data/write/reindex",
"status": {
"total": 8736065,
"updated": 2730000,
"created": 0,
"deleted": 0,
"batches": 2731,
"version_conflicts": 0,
"noops": 0,
"retries": {
"bulk": 4181,
"search": 0
},
"throttled_millis": 0,
"requests_per_second": -1,
"throttled_until_millis": 0
},
"description": "reindex from [logstash-2017.06.29] to [log-2017.06.29]",
"start_time_in_millis": 1499133683552,
"running_time_in_nanos": 19635108404699,
"cancellable": true
},
}
.
.
.
.
.

JKhondhu · July 4, 2017, 1:43pm

Hi,

"status": {
"total": 8736065,
"updated": 2730000,

Looks like the reindex task is still running.
Start time 1499133683552 (GMT: Tuesday, 4 July 2017 02:01:23.552)

Running time 19635108404699
327.25 minutes

Read more in the needful here: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html#docs-reindex-task-api

What gave you the inclination that you had 72 identical reindex tasks previously?

Viorel_Florian · July 4, 2017, 2:01pm

The above output I posted is just a snippet of the full output (hence the dots at the end). There are 72 task elements within the full output.
I cancelled all these tasks by calling:
POST _tasks/_cancel?nodes=XG7fQBGKSoyRP1lThsf6Eg&actions=*reindex

And deleted the destination index "log-2017.06.29", which at this point the size and number of documents within was greater than the source. The source index was about 4.3GB, and after the tasks have been canceled the destination index was about 10GB.
Also yesterday I had other indices that I tried to reindex and had the same behavior.

Rodepedroso · July 4, 2017, 5:51pm

I have the same problem.
In my case the problem does not appear in all indexes.
However, when reindexing appears, it is very slow.

It seems to loop.

Rodepedroso · July 9, 2017, 3:45pm

Viorel_Florian

Can you reindex?
One tip, try increasing the "size" to 1000 or more.
See if the problem of duplicity stops.

I raised it here to 5000 and it stopped, at least for now I did not have any more problems.

POST /_reindex
{
"source": {
"index": "logstash-2017.06.29",
"size": 1000
},
"dest": {
"index": "log-2017.06.29"
}
}

JKhondhu · July 10, 2017, 8:30am

By default _reindex uses scroll batches of 1000. You can change the batch size with the size field in the source element.

I notice the bulk figure under the retries tree. This would be the number of retries attempted by the reindex. Bulk is the number of bulk actions retried.

From logstash-2017.06.29 to log-2017.06.29. What do shard numbers look like? What resources does this single node cluster have that is reporting 100% CPU usage. This looks like push back from a resources perspective.

Rodepedroso · July 10, 2017, 11:43am

JKhondhu

No, the default is 100 documents.

Following reference: Reindex size default

_"By default reindex uses scroll batches of 100. You can change the batch size with the size field in the source elemet"

JKhondhu · July 10, 2017, 12:00pm

Yes, you are correct, my typo whilst scanning and typing my response. Thank you.

Viorel_Florian · July 19, 2017, 9:25am

Sorry for the "radio silence"
I tried reindexing with size 1000 and then with 10000. In both cases the the tasks where duplicated as before.
As for the resources, the CPU usage is 5% before the reindex and varies between 70% and 90% during the reindex until the task is duplicated, at which point CPU usage climes to 100%.
Also I have noticed that with reindex size 100 the duplication happens earlier in the reindexing process, i.e. the bigger the size the later the duplication error occurs.

Further info:

elasticsearch version 5.4.0
8 CPU cores
32GB RAM
100GB HDD (20GB free)
index size is 6.5GB with 11.6m documents

EDIT: I have just noticed that the duplication does not always happen with the 10000 size value

system · August 16, 2017, 9:25am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to find the precise "reindex" by "_tasks" API? Elasticsearch	7	5372	July 5, 2017
Re-index API creates multiple tasks Elasticsearch	3	795	April 18, 2018
Reindexing creates frivolous tasks (ES 2.3.3,. 2.4) Elasticsearch	3	692	July 5, 2017
Strange reindex behaviour Elasticsearch reindex	1	403	June 23, 2021
_reindex task creates loads of bulk requests and destroys audit log disk space Elasticsearch reindex	1	359	February 7, 2022

Elasticsearch duplicate reindex tasks

Related topics