I'm currently performing data enrichment of 2 different type of events (type A and type B) which are ingested and stored in different times in Elasticsearch.
Keep in mind that:
- Events are correlated by a correlation_id field
- Correlation ratio typeA/typeB is: 1 to n
- Events are written in different times, event type A first
- Some fields of event type A need to be written in events type B and viceversa
Following enrichment pattern and scenario details:
ENRICHMENT PATTERN SUMMARY
During events type A and B ingestion I basically write/update an event (_id=correlation_id) on a specific enrichment index, with desired enrichment data of both events. Later, with a scheduled pipeline, I enrich type A/B events using values of the enrichment event.
2 DIFFERENT EVENTS
Event type A: - field1 (correlation_id) - field2 - field3 - field7 Event type B: - field1 (correlation_id) - field4 - field5 - field6
INDEX 1: storing event of type A INDEX 2: storing event of type B INDEX 3: storing fields of both events (field2, field4, field5, field7)
DATA INGESTION/ENRICHMENT STEPS
t0 -> Ingestion of Event type A to INDEX 1, ingestion of an event containing field1,field2,field7 to INDEX 3 using field1 value as _id t1 -> Ingestion of Event type A to INDEX 2, update of event stored in INDEX 3, adding field4, field5 to it t2 -> Scheduled enrichment (update) of Event type A and Event type B using data of INDEX 3 (queried per field1)
PRO of this solution:
- no additional load during data ingestion on t0 and t1
According to your experience:
- Is this the best asynchronous enrichment solution?
- Is this something that could be done using the new enrich processor feature of ES 7.5? It doesn't seem so because in this scenaria data need to be enriched in both ways (from index 1 to 2 and viceversa)
- do you have any advices?
Thank you in advance,