Best strategy to "JOIN" data from diferent .json files (Data enrichment? Rollup? Transform? Dictionary?)

Let's say I have three .json files containing this information:
employees.json:

{
"employee_id":1,
"name":"John"
},
{
"employee_id":2,
"name":"Elton"
}

companies.json:

{
"company_id":1001,
"company_name":"ACME"
},
{
"company_id":1002,
"company_name":"Capsule"
}

where_they_work.json:

{
"company_id":1001,
"employee_id":1
},
{
"company_id":1001,
"employee_id":"2
}

What I want to have, at the end, is one document in Elasticsearch with this information:
Expected result:

{
"employee_id":1,
"name":"John",
"company_name": "ACME"
},
{
"employee_id":2,
"name":"Elton",
"company_name": "ACME"
}

What's the best approach to achive it? I can imagine at least this options:

  1. Dictionaries (logstash, not good idea, because I'll need to manually update the dictinonary all the time)
  2. Data Enrichment: Can work, but don't know if it will work with the three .json/indexes, maybe with two works fine.. I don't know.
  3. "Fancier" options: Rollups? Transforms? Other?

Please help!

Are you already using Logstash? How are you indexing your data?

This can be done in a easy way using the translate filter in logstash, but you would need to change the format of your json files.

But how, depends on which information is the source and which one you want to enrich.

For example, your documents have the company_id and employee_id fields with their numeric IDs and you want to enrich with the the name of company and employee?

Yes, I'm using Logstash to ship the data from the .json files to Elasticsearch.
The usage of the translate filter plugin is the one I meant when I said using dictionaries. I can do that, but the problem is this data will be dynamic, so I'll be obly to update the dictionary too often.
So far, the only data is in ES is the one is in employees.json of my example. I have to deicide how to put the other information in ES in order to have the company_name in the same index where now is the original info (employee_id and name in my example).

I would say that is way easier to enrich the data before indexing it.

Rollups are deprecated, it was replaced by downsample, but this only works for metrics, it is not used to enrich data.

Transform is also not used to enrich data, but to summarize it like get the last login of a user and things like that.

To enrich the data you have may use some filters in Logstash, like the translate filter or the enrich processor in an Elasticsearch ingest pipeline.

Personally I do not like using the enrich processor, it is less flexible than the translate filter in Logstash and I only use it when I'm not using Logstash.

So the best way in my opinion would be to use the translate filter.

If your events arriving in Logstash are like this:

{ "employee_id":1, "name":"John" }

You would need at least 2 translate filters, one to get the company_id for the user and other to get the company_name for the company_id.

The first dictionary could be like this, where the key is the employee id and the value is the company id:

"1": "1001"
"2": "1001"

And the second the key would be the company id and the value the company name:

"1001": "ACME"
"1002": "Capsule"

Your translate filters could look like this:

translate {
	source => "employee_id"
	target => "company_id"
	dictionary_path => "/path/to/the/file/company_id.yml"
	refresh_interval => 60
	fallback => "unknown company id"
}

translate {
	source => "company_id"
	target => "company_name"
	dictionary_path => "/path/to/the/file/company_name.yml"
	refresh_interval => 60
	fallback => "unknown company name"
	remove_field => ["company_id"]
}

This will result in your expected result.

Can you not automate this? The dictionary can be an external file and the translate filter can refresh it automatically, so the process of updating the dictionary can be automated.

I have multiple scenarios like this with automation scripts in bash or python updating the dictionary files.

I guess you're right. I'll try to figure it out how to automate the refreshing of that dictionaries. Thanks for your help.