Issue in Syncing the data between nodes

We have 2 nodes node_a and node_b they both are on different servers where node_a is the main node here all the data needs to be where as node_b is a temp node where the data is being dump which needs to be synced to node_a
currently we are using reindex method to sync the nodes, but any doing this we have seen there is data loss in the node_a
For e.g.
Node_a has the following document

{
  "_index": "node_a",
  "_type": "fields",
  "_id": "AW9_qd-qSvQmY6fL_xPQ",
  "_score": 1,
  "_source": {
    "name": "BOB",
    "age": 25,
    "Location": "US",
    "other": "123"
  }
}

And node_b has

{
    "_index": "node_b",
    "_type": "fields",
    "_id": "AW9_qd-qSvQmY6fL_xPQ",
    "_score": 1,
    "_source": {
      "name": "BOB",
      "gender": "male",
    }
  }

After reindexing the expected output we desire is

{
    "_index": "node_a",
    "_type": "fields",
    "_id": "AW9_qd-qSvQmY6fL_xPQ",
    "_score": 1,
    "_source": {
      "name": "BOB",
      "age": 25,
      "Location": "US",
      "other": "123",
 "gender": "male",
    }
  }

Where only the missing data gets updates in the above case it's the gender, but instead the whole document is getting replaced.

Is there a way where I can achieve this by re-indexing method where only the missing data is updated int the document.

It is not possible with the reindex api, you can only know with the reindex api if there is a conflict between the docs (different versions), in order to return the conflicts you need to use "version_type": "external" property when you reindex, otherwise the version is not considered. https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html#docs-reindex-api-request

There is a github issue that discuss the "Allow reindex to do update/upsert operations" in here: https://github.com/elastic/elasticsearch/issues/17997

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.