Reindex in a cluster does not get all documents

Elastic Stack Version: 6.5

I am trying to reindex my index "old_index" to another index "new_index" on the same cluster.

 POST _reindex
 {
   "source": {
     "index": "old_index",
     "_source": {
       "include": [
         "@timestamp",
         "@version",
         "field1",
         "field2",
         "field3"
       ]
     }
   },
   "script": {
     "inline": """ctx._source.new_field1= ctx._source.field1;ctx._source.remove("field1");"""
  },
  "dest": {
     "index": "new_index"
   }
 }

I have a cluster of 3 nodes, on "old_index" I have +2M documents, however, on "new_index" I am only getting around 500K.

I have run

GET _tasks?actions=*reindex&detailed

It shows reindexing is done! I am not sure where is all my data!
Any help will be so appreciated!

can you share the output of GET _cat/indices/old_index,new_index?vplease?

Can you also share the output of the indexing task or the output of the above command as you have not started it in the background?

Running

GET _tasks?actions=*reindex&detailed

While running the re-indexing code

{
      "nodes" : {
        "aOYQyOfCQiyOGT9HaQr8nQ" : {
          "name" : "SOME-NODE-NAME",
          "transport_address" : "some-ip:9300",
          "host" : "some-ip",
          "ip" : "some-ip:9300",
          "roles" : [
            "data"
          ],
          "attributes" : {
            "ml.machine_memory" : "17096605696",
            "xpack.installed" : "true",
            "ml.max_open_jobs" : "20",
            "ml.enabled" : "true"
          },
          "tasks" : {
            "aOYQyOfCQiyOGT9HaQr8nQ:91604987" : {
              "node" : "aOYQyOfCQiyOGT9HaQr8nQ",
              "id" : 91604987,
              "type" : "transport",
              "action" : "indices:data/write/reindex",
              "status" : {
                "total" : 2703761,
                "updated" : 0,
                "created" : 5000,
                "deleted" : 0,
                "batches" : 6,
                "version_conflicts" : 0,
                "noops" : 0,
                "retries" : {
                  "bulk" : 0,
                  "search" : 0
                },
                "throttled_millis" : 0,
                "requests_per_second" : -1.0,
                "throttled_until_millis" : 0
              },
              "description" : """reindex from [old_index] updated with Script{type=inline, lang='painless', idOrCode='ctx._source.new_field1= ctx._source.field1;ctx._source.remove("field1");', options={}, params={}} to [new_index]""",
              "start_time_in_millis" : 1561016910227,
              "running_time_in_nanos" : 2826725842,
              "cancellable" : true,
              "headers" : { }
            }
          }
        }
      }
    }

some-ip is the ip of the node that I am running reindexing on (the one hosting Kibana as I am running this on Kibana dev tools)

Running the same request without having re-indexing in background, I got:

{
  "nodes" : { }
}

Running:

GET _cat/indices/old_index,new_index

I got:

green open old_index         kTzZiAggRMG0y5K4Xt7xzQ 5 2 2703795 528414  13.3gb   4.3gb
green open new_index q123gbafShOSA4Vh45gC3g 5 2  504996      0 871.4mb 291.5mb

I am still having this issue, any help is appreciated!

when the reindex is finished, how does the output of the task API look like? The output above shows that 5k documents have been processed, but I guess you have to process more as 500k have been indexed.

Also, when you just specify a reindex without any script, does this copy all data?

I have checked the logs and I found that there were a problem in the mappings that caused the re-indexing to stop at some point, so I fixed this issue by fixing the mappings of the index.

2 Likes

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.