Elasticsearch 5.1.1 remote reindex process aborts without any error


(hailima@hotmail.com) #1

Hi All,

I migrate my index from ES 1.4.4 to 5.1.1 using "_reindex" API. The reindex process always aborts without any errors before finishing all records. Here are details:

  1. Create a new index with settings in 5.1.1 ES server:
(PUT) http://hostname5_1_1:9200/nexttextindex
{
  "settings": {
    "index": {
      "number_of_replicas": "0",
      "number_of_shards": "5",
      "refresh_interval": "-1"
    }
  }
}  
(POST) http://hostname5_1_1:9200/_reindex?wait_for_completion=false
{
  "source": {
    "remote": {
      "host": "http://hostname1_4_4.com:9200"
    },
    "index": "nexttextindex",
    "size": 10
  },
  "dest": {
    "index": "nexttextindex"
  }
}
  1. Use task API to monitor the reindex process:
(GET) http://hotname5_1_1:9200/_tasks?detailed=true&actions=*reindex

{
  "nodes": {
    "H_cHXZN9SnqbdBNaew5wew": {
      "name": "hostname5_1_1",
      "transport_address": "10.169.167.203:9300",
      "host": "10.169.167.203",
      "ip": "10.169.167.203:9300",
      "roles": [
        "master",
        "data",
        "ingest"
      ],
      "tasks": {
        "H_cHXZN9SnqbdBNaew5wew:198701": {
          "node": "H_cHXZN9SnqbdBNaew5wew",
          "id": 198701,
          "type": "transport",
          "action": "indices:data/write/reindex",
          "status": {
            "total": 5653044,
            "updated": 0,
            "created": 43350,
            "deleted": 0,
            "batches": 6,
            "version_conflicts": 0,
            "noops": 0,
            "retries": {
              "bulk": 0,
              "search": 0
            },
            "throttled_millis": 0,
            "requests_per_second": 0,
            "throttled_until_millis": 0
          },
          "description": "",
          "start_time_in_millis": 1484860595578,
          "running_time_in_nanos": 11292914938,
          "cancellable": true
        }
      }
    }
  }
}
  1. As you can see from 3), everything looks good. But, After a couple of hours, the task is done and only about 60K records got indexed in ES 5.1.1 server instead of 5653044. Repeated the process a few times, it always aborted without any errors.

Appreciate if any help!


(Nik Everett) #2

You should be able to get the status of those reindex tasks with GET /_tasks/<taskId> where the <taskId> is whatever id was returned when you started. If you didn't store them then you should be able to look around with something like GET .tasks/_search. Those should contain the failure reason. Or, if it thinks it finished successfully, it should show you that.


(hailima@hotmail.com) #3

Thanks for quick reply, Nick

I used "GET /_tasks/", But, it's not working. Here details:

GET .. 9200/_tasks/198701

{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "malformed task id 198701"
}
],
"type": "illegal_argument_exception",
"reason": "malformed task id 198701"
},
"status": 400
}


(Nik Everett) #4

You'll need the part before the : in the task id as well. If you don't have it then try the search I sent.

You should wrap your code blocks in ``` so they are readable.


(hailima@hotmail.com) #5

Ok, command "..9200/.tasks/_search" is good one which gives me error info:

  1. Error:
"type": "process_cluster_event_timeout_exception",
"reason": "failed to process cluster event (put-mapping) within 30s"
},
"status": 503```

Any way to change the timeout (30s) to longer time? I have many different doc types inside the index and under each doc type, i have up to more than 1k docs.

2) If any error occurs during the reindexing process, is there any way to ignore the error and continue the process ? I have a few M records, it's time-consuming if restarting it again. It's Ok if I lose some data in the reindex process

(Nik Everett) #6

Probably but I don't know if off hand. I think your are better off manually creating the mapping before running reindex.

No. The isn't a place to store the errors so we never implemented this.

If you want some protection against this maybe try reindexing in chunks?


(hailima@hotmail.com) #7

tried followings with tasks API

/_tasks/210411
/_tasks/210411:1

no of them is working, But, does not brother me since .tasks/_search is good.

The doc for this part is not clear. Where can I find out the descriptions for the status like 404 or 503? Thanks


(hailima@hotmail.com) #8

We don't need to store the errors and just continue to reindex next records without aborting due to last failure ... Any setting to make it happen? What do you mean by "reindexing in chunks"? we are using "size" for batching, right?


(Nik Everett) #9

There isn't.

Use a query to limit what you are reindexing to certain days or namespaces or something. Whatever natural division your data has. Then do it again and again until you migrate all the data.


(hailima@hotmail.com) #10

thanks and it's very helpful! Is there anyway to clean up task errors under the command .tasks/_search?


(Nik Everett) #11

You can and should delete them when you are done with them. You can use delete-by-query or delete each one when you know you are done with it.


(hailima@hotmail.com) #12

ok and thanks, If i don't delete them, are they expired automatically? if yes, when are they expired?


(hailima@hotmail.com) #13

Hi Nik,

The task cancel API is not working for ES 5.1.1

Here are ones I tried based reindex doc

(POST) /_tasks/1241809:1/_cancel
(POST) /_tasks/1241809/_cancel

any idea ?

thanks


(Eric Hibbs) #14

We are currently over a month behind on a migration because we have to babysit each and every reindex because of silent failures. We've resorted to setting the logger.root to "Debug". Good luck!


(hailima@hotmail.com) #15

I wrapped the codes inside xxxxx . It's not working as expected


(system) #16

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.