How to join two stream data sources and find matches

I'm asking for an idea or approach to solve the following business problem:

Two stream data sources (A and B) continuously ingesting events into two separate indices (A and B) in Elasticsearch. Each of them has a unique _id.

My goal is to merge Index A with Index B based on the unique _id field of the document and split the result into three buckets:

Bucket 1: All documents from Index A without a matching _id from Index B
Bucket 2: All documents from Index B without a matching _id from Index A
Bucket 3: All documents which has been matched but shall only stored as a single document.

Keep in mind, the data might be incomplete at a certain time but documents may match in a later stage due to continuous document ingestion.

May I implement the explained scenario with Logstash (how?) or do I need other components like kafka and/or message queueing? What is the most convenient approach? Any experiences?

Thanks for you help.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.