We are trying to send messages from an application server to Kafka, but some rows are too long so that we have to get rid of some part of the row. We can filter out long messages by ruby but we need to process further not to loose these messages. An example row is as below;
You can e.g. use a mutate filter and its gsub option to delete everything between two braces, but if braces can occur elsewhere in the log messages that might not be a great idea.
Another option is to use a csv filter (with || as the separator) to separate the string into fields, use e.g. a mutate filter to process the field with the JSON string, and then piece everything back again (unless you want to send a JSON object to Kafka).
The exact problem is we don't want to parse every event but only the ones which are long. So we used ruby filter to find these long messages. But we couldn't go further with ruby. It is a costly operation to parse every event on our side. So we need to parse only the long events (which are filtered).
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.