Elasticsearch duplicate reindex tasks

I started a reindex task on a single node elasticsearch instance.
POST /_reindex { "source": { "index": "logstash-2017.06.29" }, "dest": { "index": "log-2017.06.29" } }
I left it to run overnight and in the morning the VM it was running on was reporting 100% CPU usage, constantly.
I executed the following call to see if the reindex has finished:
GET /_tasks?pretty&detailed&actions=*reindex
I found that elastic node was running 72 identical reindex tasks.
Any suggestions as to why this happened and how to fix it?

Hi,

Please share the output from:
GET _tasks?pretty&detailed&actions=*reindex which gave you the inclination of 72 identical tasks?

Also, If still running:
GET _tasks?detailed=true&actions=*reindex

Here is the output of "GET _tasks?detailed=true&actions=*reindex"
{
"nodes": {
"XG7fQBGKSoyRP1lThsf6Eg": {
"name": "Laa Laa",
"transport_address": "es_ip:9300",
"host": "es_ip",
"ip": "es_ip:9300",
"roles": [
"master",
"data",
"ingest"
],
"attributes": {
"ml.enabled": "true"
},
"tasks": {
"XG7fQBGKSoyRP1lThsf6Eg:33268730": {
"node": "XG7fQBGKSoyRP1lThsf6Eg",
"id": 33268730,
"type": "transport",
"action": "indices:data/write/reindex",
"status": {
"total": 8736065,
"updated": 2681000,
"created": 0,
"deleted": 0,
"batches": 2682,
"version_conflicts": 0,
"noops": 0,
"retries": {
"bulk": 4081,
"search": 0
},
"throttled_millis": 0,
"requests_per_second": -1,
"throttled_until_millis": 0
},
"description": "reindex from [logstash-2017.06.29] to [log-2017.06.29]",
"start_time_in_millis": 1499134163592,
"running_time_in_nanos": 19155089098824,
"cancellable": true
},
"XG7fQBGKSoyRP1lThsf6Eg:32601074": {
"node": "XG7fQBGKSoyRP1lThsf6Eg",
"id": 32601074,
"type": "transport",
"action": "indices:data/write/reindex",
"status": {
"total": 8736065,
"updated": 2888000,
"created": 0,
"deleted": 0,
"batches": 2889,
"version_conflicts": 0,
"noops": 0,
"retries": {
"bulk": 4425,
"search": 0
},
"throttled_millis": 0,
"requests_per_second": -1,
"throttled_until_millis": 0
},
"description": "reindex from [logstash-2017.06.29] to [log-2017.06.29]",
"start_time_in_millis": 1499132363415,
"running_time_in_nanos": 20955188713161,
"cancellable": true
},
"XG7fQBGKSoyRP1lThsf6Eg:33493490": {
"node": "XG7fQBGKSoyRP1lThsf6Eg",
"id": 33493490,
"type": "transport",
"action": "indices:data/write/reindex",
"status": {
"total": 8736065,
"updated": 2588000,
"created": 0,
"deleted": 0,
"batches": 2589,
"version_conflicts": 0,
"noops": 0,
"retries": {
"bulk": 3893,
"search": 0
},
"throttled_millis": 0,
"requests_per_second": -1,
"throttled_until_millis": 0
},
"description": "reindex from [logstash-2017.06.29] to [log-2017.06.29]",
"start_time_in_millis": 1499134763958,
"running_time_in_nanos": 18554748489822,
"cancellable": true
},
"XG7fQBGKSoyRP1lThsf6Eg:33092599": {
"node": "XG7fQBGKSoyRP1lThsf6Eg",
"id": 33092599,
"type": "transport",
"action": "indices:data/write/reindex",
"status": {
"total": 8736065,
"updated": 2730000,
"created": 0,
"deleted": 0,
"batches": 2731,
"version_conflicts": 0,
"noops": 0,
"retries": {
"bulk": 4181,
"search": 0
},
"throttled_millis": 0,
"requests_per_second": -1,
"throttled_until_millis": 0
},
"description": "reindex from [logstash-2017.06.29] to [log-2017.06.29]",
"start_time_in_millis": 1499133683552,
"running_time_in_nanos": 19635108404699,
"cancellable": true
},
}
.
.
.
.
.

Hi,

"status": {
"total": 8736065,
"updated": 2730000,

Looks like the reindex task is still running.
Start time 1499133683552 (GMT: Tuesday, 4 July 2017 02:01:23.552)

Running time 19635108404699
327.25 minutes

Read more in the needful here: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html#docs-reindex-task-api

What gave you the inclination that you had 72 identical reindex tasks previously?

The above output I posted is just a snippet of the full output (hence the dots at the end). There are 72 task elements within the full output.
I cancelled all these tasks by calling:
POST _tasks/_cancel?nodes=XG7fQBGKSoyRP1lThsf6Eg&actions=*reindex

And deleted the destination index "log-2017.06.29", which at this point the size and number of documents within was greater than the source. The source index was about 4.3GB, and after the tasks have been canceled the destination index was about 10GB.
Also yesterday I had other indices that I tried to reindex and had the same behavior.

2 Likes

I have the same problem.
In my case the problem does not appear in all indexes.
However, when reindexing appears, it is very slow.

It seems to loop.

1 Like

Viorel_Florian

Can you reindex?
One tip, try increasing the "size" to 1000 or more.
See if the problem of duplicity stops.

I raised it here to 5000 and it stopped, at least for now I did not have any more problems.

POST /_reindex
{
"source": {
"index": "logstash-2017.06.29",
"size": 1000
},
"dest": {
"index": "log-2017.06.29"
}
}

By default _reindex uses scroll batches of 1000. You can change the batch size with the size field in the source element.

I notice the bulk figure under the retries tree. This would be the number of retries attempted by the reindex. Bulk is the number of bulk actions retried.

From logstash-2017.06.29 to log-2017.06.29. What do shard numbers look like? What resources does this single node cluster have that is reporting 100% CPU usage. This looks like push back from a resources perspective.

JKhondhu

No, the default is 100 documents.

Following reference: Reindex size default

_"By default reindex uses scroll batches of 100. You can change the batch size with the size field in the source elemet"

Yes, you are correct, my typo whilst scanning and typing my response. Thank you.

1 Like

Sorry for the "radio silence"
I tried reindexing with size 1000 and then with 10000. In both cases the the tasks where duplicated as before.
As for the resources, the CPU usage is 5% before the reindex and varies between 70% and 90% during the reindex until the task is duplicated, at which point CPU usage climes to 100%.
Also I have noticed that with reindex size 100 the duplication happens earlier in the reindexing process, i.e. the bigger the size the later the duplication error occurs.

Further info:

  • elasticsearch version 5.4.0
  • 8 CPU cores
  • 32GB RAM
  • 100GB HDD (20GB free)
  • index size is 6.5GB with 11.6m documents

EDIT: I have just noticed that the duplication does not always happen with the 10000 size value

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.