I have an application that produces lines in a log file with a unique UUID in its file name. When the log file grows to a certain size it gets rolled. The files are not updated after they are rolled. Eg:
UUID_MY_FILE.log.2 => This is the oldest file
UUID_MY_FILE.log.1 => This is the second oldest file
UUID_MY_FILE.log => This is the newest file with new log lines being written to it.
What I want to do is: as the logs get rolled and new data added, get the combined content of all the log files into a single document (I will generate an identifiable document ID for each document using the log file's UUID).
What would be the best / optimum way to achieve this? I reckoned I'd better ask for advise here first as probably this scenario might have already been covered previously.
But my own approach which I'm in the middle of testing for feasibility is:
Use the multline codec in the file input plugin that would just read the whole file:
Sorry, I should have mentioned, the files are not big, each file might have a few hundred lines totaling a few hundred KB in size. I would definitely not do this if the files were huge .
The reason why I would like to do this is mainly because of how the app behaves and how users would like to see the data.
In summary, app generates / executes some "events", for each event the small amount of logs will be stored in a group of files as described above (identified by some UUID in the log files' names). So for clarity and ease of use purposes, we just want to put the logs of each event (which just happens to be residing in multiple files) into a single document.
I originally proposed to just tail the log files normally and send the logs line by line, i.e. have one document per log line but was rejected because there would be an un-ideal number of documents showing on Kibana for each of our app's event.
The preference is to have one document per app event - which would make it clearer when looking on Kibana. So you click expand a document and you see all the logs related to that particular event generated by the app.
However though, before we get too excited, I can see the order of the events sometimes does not follow the right chronological order which I think is because Elasticsearch persists things asynchronously so when Logstash does the Elasticsearch query some newer log events get returned before the old log events then in the updated "message" field log lines will appear in the wrong chronological order.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.