What is the best way to classify data?

I have data that originates from three sources and thus saved in a separate index each.

  • metricbeat
  • manual data: manually ingested data
  • app data: data ingested by an app

The documents in each index have different fields and mappings but have one field in common. Let's say the common field is the 'name'. That field is unique

Now i want to analyse and see all the names that exist. I also want to see in which index they exist.

My approach was to use reindex first to gather a snapshot of metricbeat. In a pipeline processor i set the _id = name

In the next step i use reindex on the manual data as source and the metricbeat snapshot as target. Here i also use the processor 'set' _id = name but this time a add another field 'exists_in_manual_data' : 'yes'.

The idea is to 'enrich' the metricbeat snapshot. The problem is, that reindex overwrites the existing document that has the same id instead of just adding new fields.

I want the enriched documents in the snapshot index to look like this:

'{ "name" : "Ben", "exist_in_manual_data" : "yes", "exists_in_app_data" : "yes"}'

or like this:

'{ "name" : "Ben", "type" : ["snapshot", "manual", "app"] }'

Is there an 'Elasticsearch' way to accomplish this?

I also thought about something like using the set processor "exist_in_manual_data" : "yes" conditionally, if 'index manual_data contains name', but i couldn't find a way in the docs how to use a query in the conditional if statement