I need to batch/accumulate events based on custom criteria and I was thinking about developing a processor to do so. I think I understand how I could do the aggregation in the custom processor and drop the aggregated events, but I'm not quite seeing how I could create new events to be shipped off by outputters once the batching is complete.
This isn't possible with the current processor interfaces. The processors are event driven, expect one event in, and expect zero or one events out.
It would be useful to have the concept of a stateful processor that takes in arbitrary events, processes them, and at some point outputs an event sounds like it could be useful in some circumstances. The interface would need to receive notification when the pipeline is shutting down so that it could flush any final data.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.