I have a log file which looks like this. Its basically a tab separated text file with a few lines of metadata text. I would want to extract "TIME" only as part of metadata field. For other tab separated lines, I would want to ignore data under "DOTS HEARD FROM BUT NOT IN CONFIGURATION" and "REPEATERS". I am using the csv filter with separator as "tab space", but I am not able to distinguish between the sections.
I would do something very similar to the solution I proposed for the other format you had...
if [message] =~ /^(\s*$|:::::)/ {
drop {}
} else if [message] =~ /^TIME/ {
# parse it and stash it in a ruby class variable
} else if [message] =~ " .* .* " {
csv {separator => " " autodetect_column_names => true }
# and append the metadata
} else {
drop {}
}
Hey @Badger, I did try a similar solution before posting this question. The issue I am facing is, there will be 3 column header lines matching the condition " .* .* ". I want to drop the other two because they have no data rows under them. I want to parse only the ones which have data rows under the column headers. Hope you got my concern.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.