I am trying to use debezium in a setup where each table is its own index. I want to create a merged index

In the debezium setup i'll have an index for each table. Lets say I have the three sources indicies an Owner index, a pets index, treats index that each pet eats. How can I have a merged index where it will include a owner and the pets they own, the treats the pets eat.

I understand we can use the enrichment policies but is there a way to do a merged index. The messages will only hit the three source indexes.

Example

Owner index

[
  {
    "_index": "owner",
    "_type": "_doc",
    "_id": "1",
    "_score": 1.0,
    "_source": {
      "owner_id": 1,
      "first_nm": "John",
      "last_nm": "Doe"
    }
  },
  {
    "_index": "owner",
    "_type": "_doc",
    "_id": "2",
    "_score": 1.0,
    "_source": {
      "owner_id": 2,
      "first_nm": "Jane",
      "last_nm": "Doe"
    }
  }
]

Pet index
[
  {
    "_index": "pets",
    "_type": "_doc",
    "_id": "1",
    "_score": 1.0,
    "_source": {
      "owner_id": 1,
      "pet_nm": "Bolt",
      "pet_id": 1
    }
  },
  {
    "_index": "pets",
    "_type": "_doc",
    "_id": "2",
    "_score": 1.0,
    "_source": {
      "owner_id": 1,
      "pet_nm": "Sparky",
      "pet_id": 2
    }
  },
  {
    "_index": "pets",
    "_type": "_doc",
    "_id": "3",
    "_score": 1.0,
    "_source": {
      "owner_id": 2,
      "pet_nm": "Yellow",
      "pet_id": 3
    }
  },
  {
    "_index": "pets",
    "_type": "_doc",
    "_id": "4",
    "_score": 1.0,
    "_source": {
      "owner_id": 2,
      "pet_nm": "Winnie",
      "pet_id": 4
    }
  }
]

Treats
[
  {
    "_index": "treats",
    "_type": "_doc",
    "_id": "1",
    "_score": 1.0,
    "_source": {
      "treat_id": 1,
      "food_nm": "decent food",
      "pet_id": 1
    }
  },
  {
    "_index": "treats",
    "_type": "_doc",
    "_id": "2",
    "_score": 1.0,
    "_source": {
      "treat_id": 2,
      "food_nm": "alright food",
      "pet_id": 1
    }
  },
  {
    "_index": "treats",
    "_type": "_doc",
    "_id": "3",
    "_score": 1.0,
    "_source": {
      "treat_id": 3,
      "treat_nm": "best food",
      "pet_id": 2
    }
  },
  {
    "_index": "treats",
    "_type": "_doc",
    "_id": "4",
    "_score": 1.0,
    "_source": {
      "treat_id": 4,
      "food_nm": "ok food",
      "pet_id": 2
    }
  }
]


Merged index needed

[
  {
    "_index": "owner",
    "_type": "_doc",
    "_id": "1",
    "_score": 1.0,
    "_source": {
      "owner_id": 1,
      "first_nm": "John",
      "last_nm": "Doe",
      "pets": [
        {
          "pet_id": 1,
          "pet_nm": "Sparky"
        },
        {
          "pet_id": 2,
          "pet_nm": "Sparky"
        }
      ],
      "treat": [
        {
          "pet_id": 1,
          "treat_id": 1,
          "food_nm": "decent_food"
        },
        {
          "pet_id": 2,
          "treat_id": 2,
          "food_nm": "alright food"
        },
      ]
    }
  } etc..
]

This seems like a question for the Debezium community as there is no way to merge indices the way you describe (possibly apart from using enrichment policies) within Elasticsearch.

Will do on messaging the debezium community. Is there a possibility to use the transform api to get something like I mentioned in the topic above. I just made an edit.

No, you want something like a JOIN, which does not exist in Elasticsearch.

You need to normalize your data before indexing it.

2 Likes

It sounds like you want to replicate a join or materialized view of some sorts. This is in my opinion something that should be done before data is ingested into Elasticsearch.

1 Like