We have several configuration files include about 50 filters. It take about 9 mins to filter 10,000,000 events.
After reducing to 30 filters, it take around 6 mins to complete to filter 10,000,000 events.
In the future, we need to increase more filter and concerned the performance if too many filters are used. We thought to develop the new filter plugin to improve the performance. It appears to be time-consuming to develop a new filter plugin. We're wondering if some other solutions can improve the performance if many filters(100+) are used.
Is it all a lot of conditional around different csv formats? Any other type of processing that could be slow? What inputs do you have? How many columns do the csv files typically have?
Yes. The conditional is all around csv formats. The filter are mostly used csv and the configuration options are mostly use "columns" and "convert" for csv filter.
The event source is the log file for the OS and some applications.
Hi Christian
Thanks for your updates. I will test if filter dissect can improve the performance in our scenario. In addition I also found another symptom. If the lowercase option is commented in "mutate" filter as the following, the transfer rate will be increased from 24KiB/s to 30KiB/s. Is it working as designed? Whether some solution can replace "lowercase" in "mutate" filter to improve the performance?
In addition I tried the similar scenario in another box, there is no obvious difference between enabling lowercase and disabling lowercase option as the following. I am wondering which parameter or setting can trigger the difference for the option lowercase?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.