Debugging Partially Complete Reindexing Task?

micah.williamson · January 7, 2019, 8:39pm

I am currently trying to reindex an existing index into a new index with an updated schema. The issue i'm having is that the reindexing process gives up reindexing my ~2.5m index after about ~0.5m (give or take. it's inconsistent). To be clear, I don't mean that the request times out, but even if I provide the ?wait_for_completion=false flag the reindexing task it self ends prematurely. I've already asked this question on SO but I'm going to ask it here as well as this might be the better place to ask these sorts of questions.

My elasticsearch cluster is hosted on AWS but as far as I can tell no errors are given. I've tried the different options in the reindexing endpoint but none have made a difference. I've tried reindexing into the new schema as well as reindex into the very same schema as the existing index.

The request I'm making is this-

POST /_reindex?wait_for_completion=false
{
  "source": {
    "index": "users_v1"
  },
  "dest": {
    "index": "users_v2"
  }
}

Despite the v1/v2 in the example, this is actually a reindex from v8 to v9- so i've done this 7 times already successfully.

Am I hitting resources limits? Could it be a single poison-pill document? How do I go about debugging this when no errors are given?

tamara · January 9, 2019, 4:51pm

Hi,

If you are finding problems you can use two different properties to help you to complete the uncompleted reindex.

You use:

POST _reindex
{
"conflicts": "proceed",
  "source": {
    "index": "users_v1"
  },
  "dest": {
    "index": "new_twitter",
    "op_type": "users_v2"
  }
}

Settings op_type to create will cause _reindex to only create missing documents in the target index.

Also, you can also use this reindex since the first attempt:

POST _reindex
{
"conflicts": "proceed",
  "source": {
    "index": "users_v1"
  },
  "dest": {
    "index": "users_v2",
    "version_type": "external"
  }
}

Setting version_type to external will cause Elasticsearch to preserve the version from the source, create any documents that are missing, and update any documents that have an older version in the destination index than they do in the source index.

And don't forget to use the "conflicts": "proceed", which will make the conflicts don't abort the reindex.

I would also recommend you to track the reindex task by using:

Reindex works with Tasks Api
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html#docs-reindex-task-api

system · February 6, 2019, 4:51pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Losing documents while Reindex Elasticsearch	13	5575	January 1, 2019
[SOLVED] Elasticsearch python API and reindex module Elasticsearch	11	3201	July 5, 2017
Not all documents copied after reindex Elasticsearch	18	6874	August 20, 2018
Timeout issue when using reindex API Elasticsearch	1	2865	January 1, 2020
Reindexing due to index incompatibility gone wrong, can't find task result Elasticsearch migration , reindex , snapshot-and-restore	3	21	October 28, 2024

Debugging Partially Complete Reindexing Task?

Related topics