Best strategy to "JOIN" data from diferent .json files (Data enrichment? Rollup? Transform? Dictionary?)

I would say that is way easier to enrich the data before indexing it.

Rollups are deprecated, it was replaced by downsample, but this only works for metrics, it is not used to enrich data.

Transform is also not used to enrich data, but to summarize it like get the last login of a user and things like that.

To enrich the data you have may use some filters in Logstash, like the translate filter or the enrich processor in an Elasticsearch ingest pipeline.

Personally I do not like using the enrich processor, it is less flexible than the translate filter in Logstash and I only use it when I'm not using Logstash.

So the best way in my opinion would be to use the translate filter.

If your events arriving in Logstash are like this:

{ "employee_id":1, "name":"John" }

You would need at least 2 translate filters, one to get the company_id for the user and other to get the company_name for the company_id.

The first dictionary could be like this, where the key is the employee id and the value is the company id:

"1": "1001"
"2": "1001"

And the second the key would be the company id and the value the company name:

"1001": "ACME"
"1002": "Capsule"

Your translate filters could look like this:

translate {
	source => "employee_id"
	target => "company_id"
	dictionary_path => "/path/to/the/file/company_id.yml"
	refresh_interval => 60
	fallback => "unknown company id"
}

translate {
	source => "company_id"
	target => "company_name"
	dictionary_path => "/path/to/the/file/company_name.yml"
	refresh_interval => 60
	fallback => "unknown company name"
	remove_field => ["company_id"]
}

This will result in your expected result.

Can you not automate this? The dictionary can be an external file and the translate filter can refresh it automatically, so the process of updating the dictionary can be automated.

I have multiple scenarios like this with automation scripts in bash or python updating the dictionary files.