As referenced here:
I'm currently attempting to migrate our elasticsearch data to being 2.0 compatible (ie: no dots in field names) in preparation for an upgrade form 1.x to 2.x.
I've written a program that runs through the data (in batches) that sits in a one-node cluster, and renames the fields, re-indexing the documents using the Bulk API.
At some point it all goes wrong, and the total number of documents coming back from my query (to be "ugpraded") doesn't change, even though it should be counting down.
Initially I thought that it wasn't working. When I pick a document and query for it to see if it's changing, I can see that it is working.
However, when I query documents for a specific field within that document I get two results with the same ID. One of the results has the upgraded field, the other one does not - so the previous query must be selecting the one with the highest version?
On further inspection I can see that they come from different shards:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 19.059433,
"hits" : [ {
"_shard" : 0,
"_node" : "FxbpjCyQRzKfA9QvBbSsmA",
"_index" : "status",
"_type" : "status",
"_id" : "http://static.photosite.com/80018335.jpg",
"_version" : 2,
"_score" : 19.059433,
"_source":{"url":"http://static.photosite.com/80018335.jpg","metadata":{"url.path":["http://www.photosite.com/80018335"],"source":["http://www.photosite.com/80018335"],"longitude":["104.507755"],"latitude":["21.601669"]}},
...
}, {
"_shard" : 3,
"_node" : "FxbpjCyQRzKfA9QvBbSsmA",
"_index" : "status",
"_type" : "status",
"_id" : "http://static.photosite.com/80018335.jpg",
"_version" : 27,
"_score" : 17.607681,
"_source":{"url":"http://static.photosite.com/80018335.jpg","metadata":{"url_path":["http://www.photosite.com/80018335"],"source":["http://www.photosite.com/80018335"],"longitude":["104.507755"],"latitude":["21.601669"]}},
...
}
}
Why is elasticsearch allowing this? How can I prevent this? What I need is for the re-index to ensure that the data is overwritten. Any help (or alternative methods of upgrading the data) is appreciated.
elasticsearch version: 1.7
no. of nodes: 1
no of shards total: 17
shards holding my index: 5