As there are many ways to achive similar goal using logstash filters, would like to discuss and compare between KV, Dissect, Split, and Grok, which is a better way of handling data?
Scenario 1:
Mapping & Parsing logs with consistent delimiter
Better:KV
Scenario 2:
Mapping & Parsing logs with inconsistent pattern
Better: If both Dissect and Grok able to achieve the same goal, should we use Dissect or Grok for a more efficient / effective parsing?
Scenario 3:
Extracting data from field (For example extracting server name fqdn, or domain from a url)
Better: If both Split / Grok able to achive the goal, which will be more efficient to use?
Anyone have any example would be nice to discuss together too!!
In general I would say that the extreme flexibility of grok comes at a price compared to dissect.
That said, unless you are processing very large volumes of data it does not make sense to choose a filter based on cost. Instead choose whichever one is simpler. If you are processing huge volumes of data, to the point where cost is a significant factor, then benchmark each option and measure the cost.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.