Why does reindex cause Updates?


(Matthew Field) #1

I am carrying out a reindex operation, to a new index, but get results like

INFO:__main__:completed task Xp8tUzxiScenFu_pV7gmQQ:1558654
INFO:__main__:{'deleted': 0, 'batches': 149, 'version_conflicts': 0, 'total': 148161, 'created': 148159, 'noops': 0, 'throttled_until_millis': 0, 'updated': 2, 'requests_per_second': -1.0, 'throttled_millis': 0, 'retries': {'bulk': 0, 'search': 0}}

So as a result, i am creating new records for nearly all, but updated 2 records.

Before the reindex operation, the destination index does not exist. It gets created
Why is the index creating these updates , if i am simply reindexing to a new virgin index?

I do not understand how this is possible.

Thanks for any help you can give.


#2

Could you show us the body of the reindex query? and Do you set the IDs manually?


(Matthew Field) #3

Thanks, please find the body of the reindex below. I don't set the ids manually. All i do is include an if id ="" noop, because i was getting failure because of non existent IDs..(didn't understand that either, but the script fixed it..)

Admittedly i am going from daily index to a monthly index...but all the same it seems to me that the probablity of 2 ids colliding by chance have to be much lower than the number of "updates" i am getting.

curl -XPOST "http://localhost:9200/_reindex?wait_for_completion=false" -H 'Content-Type: application/json' -d'
{
  "conflicts": "proceed",
  "source": {
    "remote": {
      "host": "http://xxx.xx.x.xxx:9200",
      "socket_timeout": "1m",
      "connect_timeout": "10s"
    },
    "index": "logstash-2017.01.*",
    "type": ["api-log", "search-log"]
    
  },
  "dest": {
    "index": "logstash-2017.01"
  },
  "script": {
    "source": "if (ctx._id==\"\"){ctx.op=\"noop\"}else{ctx._source.doctype=ctx._type;ctx._type=\"doc\";}",
    "lang": "painless"
  }
}'

#4

Thank you,
What is the error message you get when you remove the "conflicts": "proceed"?
I noticed in the task response, there's no "noops" documents ('noops': 0). So maybe this condition is not working properly and lead to the update?
Could you also check that you have unique Ids for documents in types "api-log" and "search-log", because you're reindexing them all in the same index "logstash-2017.01"?


(system) #5

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.