I'm building an offline log parser using logstash, ES and Kibana. In this scenario the logs are provided in a zip from servers. They are placed in a volume mounted into the logstash container, then pushed into ES and viewed from Kibana.
What this means is that the original server that generated the logs is not the server sending the events to ES.
What I wondered is how can I get logstash to parse the hostname from the offline logs, and then apply this to all events?
I actually don't think this would be possible. I suspect logstash streams lines from the log file hence, if the hostname appeared in line 100, it would not be possible to go back and update the host value on preceding events.
Another idea which could work outside of logstash would be preprocessing the offline logs for the hostname. This raises the slight different question of how to add this externally parsed value into logstash?
Here is a sample where you can see the host that is running the process is server01
2021-09-02 12:59:22.129+0000 INFO [com.example.WebServer] Gracefully shutting down process
2021-09-02 12:59:22.129+0000 INFO [com.example.WebServer] Shutdown comple
2021-09-02 13:21:11.452+0000 INFO [com.example.WebServer] Web server starting up
--- Server Start up ---
Operating System: Windows Server 2012 R2; version: 6.3; arch: amd64; cpus: 4
Process id: 4104@server01
2021-09-02 13:21:14.862+0000 INFO [com.example.LocksFactories] Locking selected
2021-09-02 13:21:15.307+0000 INFO [com.example.SIModule] ServerId{eb63dcec} (eb63dcec-5ac0-4a88-8ffe-c3efa1437fbc)
2021-09-02 13:21:16.844+0000 INFO [com.example.WebServer] Running version 1.0
I can parse out the host using grok. That's not the issue. The challenge is adding this parsed host to all events prior and post.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.