Best strategy to "JOIN" data from diferent .json files (Data enrichment? Rollup? Transform? Dictionary?)

leandrojmp · September 11, 2024, 10:00pm

I would say that is way easier to enrich the data before indexing it.

Rollups are deprecated, it was replaced by downsample, but this only works for metrics, it is not used to enrich data.

Transform is also not used to enrich data, but to summarize it like get the last login of a user and things like that.

To enrich the data you have may use some filters in Logstash, like the translate filter or the enrich processor in an Elasticsearch ingest pipeline.

Personally I do not like using the enrich processor, it is less flexible than the translate filter in Logstash and I only use it when I'm not using Logstash.

So the best way in my opinion would be to use the translate filter.

If your events arriving in Logstash are like this:

{ "employee_id":1, "name":"John" }

You would need at least 2 translate filters, one to get the company_id for the user and other to get the company_name for the company_id.

The first dictionary could be like this, where the key is the employee id and the value is the company id:

"1": "1001"
"2": "1001"

And the second the key would be the company id and the value the company name:

"1001": "ACME"
"1002": "Capsule"

Your translate filters could look like this:

translate {
	source => "employee_id"
	target => "company_id"
	dictionary_path => "/path/to/the/file/company_id.yml"
	refresh_interval => 60
	fallback => "unknown company id"
}

translate {
	source => "company_id"
	target => "company_name"
	dictionary_path => "/path/to/the/file/company_name.yml"
	refresh_interval => 60
	fallback => "unknown company name"
	remove_field => ["company_id"]
}

This will result in your expected result.

Can you not automate this? The dictionary can be an external file and the translate filter can refresh it automatically, so the process of updating the dictionary can be automated.

I have multiple scenarios like this with automation scripts in bash or python updating the dictionary files.

Topic		Replies	Views
Merge and transform more than one json files to elasticsearch from logstash Logstash	1	287	November 5, 2019
Merge and transform more than one csv or json files to elasticsearch from logstash or ingest node pipeline (beginner) Logstash	5	704	September 24, 2019
Join information between 2 Data inputs Logstash	2	314	May 26, 2020
How to look up data in elasticseach as part of an ingest node process Elasticsearch	3	437	March 14, 2019
Enriching data in index by data from another index Elasticsearch	3	807	May 7, 2019

Best strategy to "JOIN" data from diferent .json files (Data enrichment? Rollup? Transform? Dictionary?)

Related topics