Elasticsearch transformation query

We are working on a data processing pipeline that involves multiple transformations. Specifically, we have a use case where the first transformation runs and calculates documents for various systems, including system1. In this transformation, we categorize documents based on their presence as either "Primary only", "Secondary only", or "Both".

In our second transformation, we need to calculate or process documents again for system1. I’m concerned about how changes from "Primary only" to "Both" from the first transformation will be managed. Specifically:

For example- We have 10 logs havings document as primary only now ,if for 2 logs the status changes to both from the first transform.

Since the documents with the "Primary only" status are already indexed, what’s the best approach to ensure these documents are properly updated or removed when their status changes to "Both"? We want to ensure that only "Primary only" documents are retained in the second transformation output.

Hi,

You may have multiple options to solve this problem. One that I can think of right now would be to chain transforms. Your logs would be your source for the first transform. This transform would write a new status (Primary/Secondary/Both) in your destination index. You could then use this destination index as the source of your second transform that would only consider updated status from your first transform. Depending on what you are trying to achieve, there may be better solutions.

1 Like