Elasticsearch Reindex API - reindex only missing docs

Dom_Sie · April 21, 2017, 6:52am

Hi together,

we try to reindex big Indices ( about 10 million docs per index ) with the curl command:

curl -XPOST 'http://localhost:9200/_reindex?slices=5&refresh' -d '{
  "conflicts": "proceed",
  "source": {
    "index": "'.$index.'",
    "size": 10000
  },
  "dest": {
    "index": "'.$index.$version_string.'",
    "op_type": "create"
  }
}'

The reindex process has done a good and completely job in most indices.
Two indices, however, the Reindex breaks off again and again.
In an affected index are still 500 docs to reindexing and I am trying again and again to reindex the missing docs. Unfortunately unsuccessful.

How can i reindex only the missing docs between two indices or how must I have to modify my Reindex command to the effect that the process goes completely through the reindex?

Sometimes the Reindex process throws "SearchContextMissingExceptions" - if it can not resolve an Scroll-ID ; or sometimes a data-store leaves temporarily the cluster and there comes a "node_not_connected_exception".

This was my last unsuccessful try:

count v1: 8039457
count v2: 8038957

+++ COUNT IS DIFFERENT ;; Start reindexing 2016_10 for the 64 time

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   276    0   152    0   124      0      0 --:--:--  1:10:11 --:--:--     6

+++RESPONSE: {"took":4211061,"timed_out":false,"total":8039457,"updated":0,"created":0,"batches":804,"version_conflicts":8035457,"noops":0,"retries":0,"failures":[]}

For more background-informations the response of the current reindex-tasks for this two indices:

{
  "nodes": {
    "LG1ycx-6STKYenLnqSMZIg": {
      "name": "client_xx",
      "transport_address": "x.x.x.x:9300",
      "host": "x.x.x.x",
      "ip": "x.x.x.x:9300",
      "attributes": {
        "rack": "xxx",
        "rack_id": "xxx",
        "data": "false",
        "master": "false"
      },
      "tasks": {
        "LG1ycx-6STKYenLnqSMZIg:5559629": {
          "node": "LG1ycx-6STKYenLnqSMZIg",
          "id": 5559629,
          "type": "transport",
          "action": "indices:data/write/reindex",
          "status": {
            "total": 5150349,
            "updated": 0,
            "created": 0,
            "deleted": 0,
            "batches": 231,
            "version_conflicts": 2310000,
            "noops": 0,
            "retries": 0
          },
          "description": "",
          "start_time_in_millis": 1492754781226,
          "running_time_in_nanos": 1222719041139
        },
        "LG1ycx-6STKYenLnqSMZIg:5551067": {
          "node": "LG1ycx-6STKYenLnqSMZIg",
          "id": 5551067,
          "type": "transport",
          "action": "indices:data/write/reindex",
          "status": {
            "total": 8039457,
            "updated": 0,
            "created": 0,
            "deleted": 0,
            "batches": 706,
            "version_conflicts": 7060000,
            "noops": 0,
            "retries": 0
          },
          "description": "",
          "start_time_in_millis": 1492752458445,
          "running_time_in_nanos": 3545465607841
        }
      }
    }
  }
}

nik9000 · April 21, 2017, 1:47pm

There isn't really anything reindex can do about this. The scrolls aren't resumable on another node. It is probably worth figuring out why this happens in your cluster and fixing it. But you should be able to work around it by chunking the reindex processes by filtering on some field in your documents. Like time or some keyword field or something.

system · May 19, 2017, 1:48pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Reindex API does not complete the re-indexing Elasticsearch reindex	4	63	September 13, 2024
Elasticsearch Reindex Big Index Elasticsearch	5	1765	January 15, 2018
Missing documents after _reindex of daily indices Elasticsearch	4	2252	April 19, 2018
Reindexing stuck at some batch and fails with 'search context missing exception' Elasticsearch	3	1694	June 13, 2019
Reindex large index Elasticsearch reindex	1	151	April 26, 2024

Elasticsearch Reindex API - reindex only missing docs

Related topics