Elasticsearch Querying a document to grab field using another documents field value

Hello all,

I'm unable to find documentation on the following which I believe might be a niche implementation, but will try to explain the best I can. My inexperience aside, at first glance this seems to probably relate to runtime fields.

Goal:

Use field data from one document type ("child"), to match to another document ("parent") and obtain unique attribute data; all documents are part of the same index. The parent event contains unique fields not provided in a child event, but the child event has an identifier which links to the parent.

Explanation:

The configuration is Logstash shipping logs to Elasticsearch (7.16.1) . The application generates JSON messages with different structures depending on the originating object. Logstash will filter and make each unique message "type" it's own root object for indexing. Example:

Message 1:

{
	"type": "ship",
	"uniqueId": 1.1,
	"originatingCountry": 2
}

Message 2:

{
	"type": "container",
	"uniqueId": 1.2,
	"sourceId": 1.1,
	"weight": 2
}

The above would basically have (2) root level object types generated; ship and container. All documents are ingested and time stamp sorted at event time, but will occur at different times as events are generated.

In the above scenario, the index would be filled with several event; multiple unique ships and multiple unique containers. Assume I have a few documents which have container data, I would be able to sort and visualize based on container.uniqueId for all the container related data.

Say now you'd want to cross reference some of the container data with originating country to setup further visuals? Setup a filter (runtime field maybe?) which verifies something along the lines of:

if container.sourceId exists (for this document)
then check if container.sourceId == ship.uniqueId (go look up all existing documents for ship id)
create new field container.originatingCountry

This requires a degree of provenance which is probably not the design intent of Elasticsearch, but any help would be appreciated.

Edit:

To be clear I'm encountering issues in the "then check if" stage of the logic. Due to the events occurring at different times.

My first though was with a runtime lookup field, but I don't think you can do this with all documents in the same index. Which leads me to ask - can you put them in seperate indices and leverage this?

Also if you can put them into different indices then you might also be able to use Enrich processor | Elasticsearch Guide [8.7] | Elastic for new documents coming in, which would be a more efficient method.

Yes it is possible to put the data in separate indices, however I had looked at the enriched processor and didn't really see how it was possible to link two events if they happen at different timestamps.

One thing I found which addressed the parent <-> child relationship directly was Mapping Joins. I'm having issues getting it to work within a template although following this example and manually adding data was successful.

Wondering if the routing= ( see above link for full context) is automatically incorporated if using an index template and ingesting from Logstash to Elasticsearch? I had ES redundancy setup, but could have this unique index go to only one node.

It's not, no.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.