Combining documents in two indices correlated by their ID, but the fieldname is different

Hi! I am fairly new to the elastic stack, and i was wondering if something like the title suggests is possible. Say i have two types of documents, one for cars, and one for drivers. I want to combine the two documents into one by correlating between the driver's driverId and the car's ownerId. The values are the same, but the name is different. I have read a little about transforms, and was wondering if this would fit my case

The reason i ask this, is because our system is event-driven, so if i were to do this the usual way done here, i would have to have a local id-lookup service, as well as some really wonky way to create a single document between the two, as each event does not represent a state of driver or car, but the different actions done to the state(driver created, driverName updated, or carModel changed for example). Any tips really would be helpful.

Thanks

Hi,

Is it one-to-many relation? Driver can own multiple cars?
Can you share sample data and sample desired output?

It depends on the answer to the above question, enrich policy about driverID and 'enrich' the car index with owner/driver's information could be a possible solution.

1 Like

Also, please share how would you like the documents to be combined. I.e. what kind of aggregation/script you'd use for combining?

2 Likes

As far as i know this is a one-to-many relation yes. Here is an example

Driver:
driverId
name
country
countryId

Car:
carId
model
modelId
ownerId: same as driverId

desired output:
All fields of driver
list of carId and models (or any form of this)

This is just an example, anything that accomplishes the driver having information about what carId and models he has would do the job.

Due to my lack of experience with elastic, i'm not sure what script/aggregation i would be using...

Here is an enrich processar example.
On caveat for this strategy is that you should always index car documents first and execute the enrich policy before you index the driver.

PUT /test_driver/
{
  "mappings":{
    "properties": {
      "driverId":{"type":"keyword"},
      "name":{"type":"keyword"},
      "country":{"type":"keyword"},
      "countryId":{"type":"keyword"}
    }
  }
}

PUT /test_car/
{
  "mappings": {
    "properties": {
      "carId":{"type": "keyword"},
      "model":{"type": "keyword"},
      "modelId":{"type": "keyword"},
      "ownerId":{"type": "keyword"}
    }
  }
}

POST /test_car/_bulk
{"index":{}}
{"carId":"car001","model":"model001","ownerId":"001"}
{"index":{}}
{"carId":"002","model":"model002","ownerId":"001"}

PUT /_enrich/policy/test_car_policy
{
  "match":{
    "indices":"test_car",
    "match_field":"ownerId",
    "enrich_fields":["carId","model","modelId"]
  }
}

PUT /_enrich/policy/test_car_policy/_execute

PUT /_ingest/pipeline/test_car_enrich
{
  "description": "car enrich",
  "processors": [
    {
      "enrich": {
        "policy_name": "test_car_policy",
        "field":"driverId",
        "target_field":"car",
        "max_matches":"128"
      }
    }
  ]
}

POST /_ingest/pipeline/test_car_enrich/_simulate
{
  "docs":[
    {
      "_index":"test_driver",
      "_source":{
        "driverId":"001"
      }
    }
  ]
}

PUT /test_driver/_settings
{
  "index":{
    "default_pipeline": "test_car_enrich"
  }
}

GET /test_driver

POST /test_driver/_bulk
{"index":{}}
{"driverId":"001","name":"foo"}
{"index":{}}
{"driverId":"002","name":"baa"}

GET /test_driver/_search
{
  "hits" : {
    "hits" : [
      {
        "_index" : "test_driver",
        "_type" : "_doc",
        "_id" : "pvQQfH4Bf0nakUP8c2Xp",
        "_score" : 1.0,
        "_source" : {
          "driverId" : "001",
          "car" : [
            {
              "model" : "model001",
              "ownerId" : "001",
              "carId" : "car001"
            },
            {
              "model" : "model002",
              "ownerId" : "001",
              "carId" : "002"
            }
          ],
          "name" : "foo"
        }
      },
      {
        "_index" : "test_driver",
        "_type" : "_doc",
        "_id" : "p_QQfH4Bf0nakUP8c2Xp",
        "_score" : 1.0,
        "_source" : {
          "driverId" : "002",
          "name" : "baa"
        }
      }
    ]
  }
}

1 Like

Another way to do is using transform as you said. It may also fit your case.

1 Like

Thank you so much for taking the time, i will look into it!

To use transform to join multiple indices is not straight forward, but these posts will help you to use transform.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.