I have "index_a" on a remote elasticsearch cluster that looks like this:
{
_index: "index_a",
_type: "_doc",
_id: "1",
_score: 1,
_source: {
customer_id: "1234",
customer_name: "spider",
message: "does what ever"
}
},
{
_index: "index_a",
_type: "_doc",
_id: "2",
_score: 1,
_source: {
customer_id: "3333",
customer_name: "pig",
message: "spider-pid does"
}
}
And I Also have "index_a" (yes, it's the same name!) on the current elasticsearch cluster that i'm performing the _reindex to, that looks like this:
{
_index: "index_a",
_type: "_doc",
_id: "2",
_score: 1,
_source: {
customer_id: "3333",
customer_name: "pig",
message: "spider-pid does"
}
},
{
_index: "index_a",
_type: "_doc",
_id: "3",
_score: 1,
_source: {
customer_id: "9876",
customer_name: "coronavirus",
message: "stay safe and at home"
}
}
as you can see there are duplications docs from the first "index_a" above, but there is also new data there that I wanna keep!
Eventually what I wanna end up with, in my current elasticsearch cluster is this index_b:
{
_index: "index_b",
_type: "_doc",
_id: "1",
_score: 1,
_source: {
customer_id: "1234",
customer_name: "spider",
message: "does what ever"
}
},
{
_index: "index_b",
_type: "_doc",
_id: "2",
_score: 1,
_source: {
customer_id: "3333",
customer_name: "pig",
message: "spider-pid does"
}
},
{
_index: "index_b",
_type: "_doc",
_id: "3",
_score: 1,
_source: {
customer_id: "9876",
customer_name: "coronavirus",
message: "stay safe and at home"
}
}
So basically I know for a fact that I could reach this result in two different
_reindex requests, 1st _reindex will be from the remote cluster index_a to the current elasticsearch cluster index_b.
And the 2nd _reindex will be from the current elasticsearch cluster index_a to the current cluster index_b.
but running those two _reindex request is VERY wasteful in terms of big data,
cause what the request does is basically run over each doc-id one by one and wrtie/override it.
when trying to do this on a single _reindex request, I've tried this:
POST http://current_cluster/_reindex
{
"source": {
"remote": {
"host": "http://remote_cluster/"
},
"index": ["index_a-from-remote", "index_a-of-current"] //renamed them to be more understood for you
},
"dest": {
"index": "index_b"
}
}
and the response indicates that there is no "index_a-of-current" in the remote cluster and it makes sense: it has happened because this type of _reindex request is built to only get indices from a remote elasticsearch cluster.
so my question is:
is there a way to perform a single _reindex request that would take both "index_a" from the remote cluster, and also "index_a" of the current cluster, and will reindex them both to "index_b" at the current cluster?
I would be happy if anyone cloud shed any light on this matter as I've tried a bunch of other stuff in the request and read the Reindex API documentation and didn't found an answer yet.
tnx for any help!