How to merge two indexes based on common field in Elasticsearch?

Given two indexes test and amazon.

PUT /test/_doc/1
{
  "customer_id" : "1234",
  "customer_name" : "origin",
  "farm_name": "Origin Delta"
}

PUT /amazon/_doc/2
{
  "customer_id" : "1234",
  "farm_name": "Origin Delta"
}

I want to merge these two indexes based on customer_id.

Question: Is there a way to merge two indexes, test and amazon together such that amazon index will have customer_name field with the value "origin"?

Attempt 1:

POST /test,amazon/_forcemerge
POST /_forcemerge
GET /test,amazon/_search

output:

"hits" : [
  {
    "_index" : "amazon",
    "_type" : "_doc",
    "_id" : "2",
    "_score" : 1.0,
    "_source" : {
      "customer_id" : "1234",
      "farm_name" : "Origin Delta"
    }
  },
  {
    "_index" : "test",
    "_type" : "_doc",
    "_id" : "1",
    "_score" : 1.0,
    "_source" : {
      "customer_id" : "1234",
      "customer_name" : "origin",
      "farm_name" : "Origin Delta"
    }
  }
]

Result Analysis:

  • Attempt didn't work. I was expecting customer_name: "origin" to be added to the index amazon, but it didn't work.

Attempt 2:

POST _reindex
{
  "source": {
    "index": ["amazon", "test"]
  },
  "dest": {
    "index": "new_amazon"
  }
}

GET /new_amazon/_search

output:

"hits" : [
  {
    "_index" : "new_amazon",
    "_type" : "_doc",
    "_id" : "2",
    "_score" : 1.0,
    "_source" : {
      "customer_id" : "1234",
      "farm_name" : "Origin Delta"
    }
  },
  {
    "_index" : "new_amazon",
    "_type" : "_doc",
    "_id" : "1",
    "_score" : 1.0,
    "_source" : {
      "customer_id" : "1234",
      "customer_name" : "origin",
      "farm_name" : "Origin Delta"
    }
  }
]

Result Analysis:

  • Attempt didn't work. I was expecting customer_name: "origin" to be added to the index amazon, but it didn't work.

forcemerge is doing something completely different (think of it as some compaction, that happen automatically in the background anyway). reindex allows you to index documents of one index into another, while modifying them, but not to merge two indices into one.

As it is very hard to figure out the logic how to merge two indices (even though in many cases there is one join field, but it could also easily be more than that or another join logic like calculation).

In this case you are better if writing your own logic. You might want to take a look into something called entity centric indexing, see an old elasticon talk here https://www.elastic.co/elasticon/2015/sf/building-entity-centric-indexes

Hope this helps!

--Alex

1 Like

You could use the scroll api to do a single search across both indices sorted by customer-Id. That would give you a single stream of results with related docs next to each other in the results order. A custom script could consume this stream and merge the doc pairs and then add the merged doc to a new index using the bulk api

4 Likes

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.