Keeping messages until another message of the same kind arrives has its limitations:
resources for keeping the messages,
in memory is not failsafe
what if the corresponding end message never arrives? TTL?
messages must be routed to the same worker in order to match, this might introduce hot spot workers
You have to decide whether you are ok with such limitations.
An alternative to that is indexing the messages and building sessions out of it in a 2nd pass. Such a system can be realized with Transform. The transform can be setup as continuous transform, so it would session'ize your data as it comes in.
This approach is more scalable and fail safe, however there is no free lunch: You require more resources, e.g. you have a larger source index. The source index however could be managed by ILM, if you only care about the sessions you could use a rolling index that gets deleted e.g. every week.
We have an example for calculating the duration between 2 events, I think this is potentially useful for you.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.