I've been learning about regex/grok the last few days and I'm trying to put the pieces together on how to remove text from some logs that we're ingesting. I have an example line from the logs below and have been able to match the regex/grok pattern to the text.
Example log line:
T: 2019-09-30 14:11:14,057 |L: INFO |MSG: Start- upsert of allocated status from service for holding date:10/19/2019 4:00:00 AM
RegEx/Grok pattern:
T\W\s*%{TIMESTAMP_ISO8601:timestamp}\s\|L\:\s*%{LOGLEVEL:log-level}\s\|MSG\:\s*%{GREEDYDATA:message}
The grok debugger said that the output would be:
{
"log-level": "INFO",
"message": "Start- upsert of allocated status from service for holding date:10/19/2019 4:00:00 AM",
"timestamp": "2019-09-30 14:11:14,057"
}
It seems like what I need from what I can tell. What I don't understand is the order of operations for this. Do I set something up in Filebeat to handle this or is this something that should be done in Logstash? Overall I'm just confused as to what the simplest/most efficient way to remove the text is, theres a lot of processors/plugins for both that I could use from what I'm reading.
I'd like to say: It depends I cannot say anything about Filebeat as I did not try something like that with it but you can still do it with either LogStash and ElasticSearch(if you use ElasticSearch as backend).
If you use ElasticSearch for this it makes the landscape easier: you only have FileBeat and ElasticSearch.
LogStash introduces another point of failure, another process which must be called(and might be located in another location) so the processing time increases. But, if you already have a LogStash installation for other pipelines where you require LogStash it might be preferable to have all your pipelines in a single location: LogStash.
Also you should check where you want to do the configuration: I - personally - prefer the LogStash Pipeline UI in Kibana over the File configuration of LogStash or even the API endpoint for ElasticSearch ingest pipelines.
I'll look into using Logstash as the primary tool for this then. I'm with you that Logstash introduces another point of failure but in our environment this will probably grow. Its always nice to get a bit ahead of the curve and get the major legwork out of the way by doing the hard work up front.
From the reading I just did the Grok plugin will handles regex as well and dissect does not. I'll use the grok plugin to start with. If someone knows how this can be done in Filebeat I'm also interested in learning that route as well. Appreciate the insight Wolfram. I will follow up if I have other questions later.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.