What is the best way to classify data?

wifi · September 24, 2021, 11:44am

I have data that originates from three sources and thus saved in a separate index each.

metricbeat
manual data: manually ingested data
app data: data ingested by an app

The documents in each index have different fields and mappings but have one field in common. Let's say the common field is the 'name'. That field is unique

Now i want to analyse and see all the names that exist. I also want to see in which index they exist.

My approach was to use reindex first to gather a snapshot of metricbeat. In a pipeline processor i set the _id = name

In the next step i use reindex on the manual data as source and the metricbeat snapshot as target. Here i also use the processor 'set' _id = name but this time a add another field 'exists_in_manual_data' : 'yes'.

The idea is to 'enrich' the metricbeat snapshot. The problem is, that reindex overwrites the existing document that has the same id instead of just adding new fields.

I want the enriched documents in the snapshot index to look like this:

'{ "name" : "Ben", "exist_in_manual_data" : "yes", "exists_in_app_data" : "yes"}'

or like this:

'{ "name" : "Ben", "type" : ["snapshot", "manual", "app"] }'

Is there an 'Elasticsearch' way to accomplish this?

I also thought about something like using the set processor "exist_in_manual_data" : "yes" conditionally, if 'index manual_data contains name', but i couldn't find a way in the docs how to use a query in the conditional if statement

system · October 22, 2021, 11:44am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Enrich documents by copying fields from another index Elasticsearch	11	4805	November 4, 2022
Update the value of a field in index based on its value in another index Elasticsearch	3	1276	November 25, 2021
Elastic-Index Join Elasticsearch reindex	7	391	December 1, 2022
How do I combine 2 index data in elasticsearch? Elasticsearch	22	13585	October 10, 2019
Re-index within elastic search Elasticsearch	1	280	July 6, 2017

What is the best way to classify data?

Related topics