Bidirectional asynchronous data enrichment

Hi guys,
I'm currently performing data enrichment of 2 different type of events (type A and type B) which are ingested and stored in different times in Elasticsearch.
Keep in mind that:

  1. Events are correlated by a correlation_id field
  2. Correlation ratio typeA/typeB is: 1 to n
  3. Events are written in different times, event type A first
  4. Some fields of event type A need to be written in events type B and viceversa

Following enrichment pattern and scenario details:

During events type A and B ingestion I basically write/update an event (_id=correlation_id) on a specific enrichment index, with desired enrichment data of both events. Later, with a scheduled pipeline, I enrich type A/B events using values of the enrichment event.


Event type A:
- field1 (correlation_id)
- field2
- field3
- field7

Event type B:
- field1 (correlation_id)
- field4
- field5
- field6


INDEX 1: storing event of type A
INDEX 2: storing event of type B
INDEX 3: storing fields of both events (field2, field4, field5, field7)


t0 -> Ingestion of Event type A to INDEX 1, ingestion of an event containing field1,field2,field7 to INDEX 3 using field1 value as _id
t1 -> Ingestion of Event type A to INDEX 2, update of event stored in INDEX 3, adding field4, field5 to it
t2 -> Scheduled enrichment (update) of Event type A and Event type B using data of INDEX 3 (queried per field1)

PRO of this solution:

  • no additional load during data ingestion on t0 and t1

According to your experience:

  1. Is this the best asynchronous enrichment solution?
  2. Is this something that could be done using the new enrich processor feature of ES 7.5? It doesn't seem so because in this scenaria data need to be enriched in both ways (from index 1 to 2 and viceversa)
  3. do you have any advices?

Thank you in advance,


This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.