I'm not a logstash expert but it's worth investigating the issues of robustness around that approach.
If you're relying on an in-memory cache to connect events a1 and a2 and events together then you have to ask yourself these questions:
- what happens when the logstash process dies with a buffer full of a1, z1 etc but no a2, z2s to complete them?
- what happens if a1 and a2 are very far apart in time? How much RAM do you use to keep [x]1's around waiting for the corresponding [x]2s ? What happens if you purge that RAM?
- What happens if one logstash process requires too much in the way of resources to do this join? How do I route events to multiple logstash workers based on ID? (I imagine this is possible but you may need to consider).
Re point 2 - a quick search suggests that by default a1 will hang around indefinitely in RAM waiting for a2 which has previously caused memory issues - see here. Again, I'm not a logstash expert but I assume you'll have to pick between risking overloading RAM or adopting a buffer-ageing policy that can potentially lose data.
The more complex architecture I outline in the video doesn't rely on fallible RAM buffers but may be more work for you to implement.