The problem:
We operate a elastic search cluster for storing netflow data. One of our clients operates a UDP service that is queried an insane amount by different IP's. This one client is responsible for 2/3 of the flows in elastic search. Adding about 3000 flows every second of every day.
The question
Is there a way to aggregate all these records drop the source and destination IP and only calculate the amount of traffic used over the last 5secs? While deleting or better never storing the original records. There by reducing storage and processing time.
I haven't finished evaluation of it yet, but I am hopeful that the rollup API, introduced in 6.3, will facilitate this. If it works out I will be rolling the necessary setup into ElastiFlow.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.