LS 6.3.2 Updating document with full content from multiple log files

Hi Team,

I have an application that produces lines in a log file with a unique UUID in its file name. When the log file grows to a certain size it gets rolled. The files are not updated after they are rolled. Eg:

UUID_MY_FILE.log.2 => This is the oldest file
UUID_MY_FILE.log.1 => This is the second oldest file
UUID_MY_FILE.log   => This is the newest file with new log lines being written to it.

What I want to do is: as the logs get rolled and new data added, get the combined content of all the log files into a single document (I will generate an identifiable document ID for each document using the log file's UUID).

What would be the best / optimum way to achieve this? I reckoned I'd better ask for advise here first as probably this scenario might have already been covered previously.

But my own approach which I'm in the middle of testing for feasibility is:

  • Use the multline codec in the file input plugin that would just read the whole file:

    start_position        => "beginning"
    
    codec => multiline {
      pattern             => ".*"
      negate              => "true"
      what                => "previous"
    }
    
  • Files are grouped into a single document using the UUID information in the log files' names.

  • Then use upsert for updating the existing doc.

Thanks for your help,

Putting the whole logfile into a single document sounds like a really bad idea. Why would you want to do that? How big are the files?

Sorry, I should have mentioned, the files are not big, each file might have a few hundred lines totaling a few hundred KB in size. I would definitely not do this if the files were huge :grin:.

The reason why I would like to do this is mainly because of how the app behaves and how users would like to see the data.

In summary, app generates / executes some "events", for each event the small amount of logs will be stored in a group of files as described above (identified by some UUID in the log files' names). So for clarity and ease of use purposes, we just want to put the logs of each event (which just happens to be residing in multiple files) into a single document.

Okay, I see. That's going to be tricky. The approach you're attempting sounds reasonable.

So for clarity and ease of use purposes

Could you elaborate on this?

I originally proposed to just tail the log files normally and send the logs line by line, i.e. have one document per log line but was rejected :sob: because there would be an un-ideal number of documents showing on Kibana for each of our app's event.

The preference is to have one document per app event - which would make it clearer when looking on Kibana. So you click expand a document and you see all the logs related to that particular event generated by the app.

I managed to figure this out with the help of other contributors from this thread: https://discuss.elastic.co/t/ls-6-3-2-appending-data-to-a-field-of-an-existing-document/

This is the input part and the filter and output parts are in the above thread:

input {
  file {
    sincedb_path          => "some_path"
    path                  => "some_log_path"
    type                  => "some_type"
    start_position        => "beginning"
    codec => multiline {
      pattern             => ".*"
      what                => "previous"
      max_lines           => 20000
    }
  }
}

However though, before we get too excited, I can see the order of the events sometimes does not follow the right chronological order which I think is because Elasticsearch persists things asynchronously so when Logstash does the Elasticsearch query some newer log events get returned before the old log events then in the updated "message" field log lines will appear in the wrong chronological order.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.