Combine Log Lines That Share A Unique ID


(Cheeseandpepper) #1

I've appended a unique id to the very end of every log line that tags a log message as being part of a single http request. Generally an http request has 5 log lines in my application and if there are multiple users, the lines are not necessarily in order. Here's a contrived example

1 INFO 123.123.123.123 GET /search? params='foo' unique_id=12345ABCDE
2 INFO 111.111.111.111 POST /submit? params='bar' unique_id=ZZZZ8888
3 INFO 123.123.123.123 Completed 200 34ms unique_id=12345ABCDE

My goal is to combine lines 1 and 3 into the same elasticsearch document by matching on the unique_id. Is there a good solution for this with logstash? Thanks!


(Mark Walkom) #2

Looks like https://www.elastic.co/guide/en/logstash/current/plugins-filters-aggregate.html will do it.


(Cheeseandpepper) #3

I'm starting to use the logstash-filter-aggregate plugin and I'm having mixed results. Here's my config:

input {
  file {
    path => [
      "/Users/mbp/Desktop/dev.log"
    ]
    start_position => "beginning"
  }
}

filter {

  grok {
    match => { "message" => "(?<trace_id>(?![trace_id=])[\S][^\s]*$)"}
  }

  grok {
    match => { "message" => "(?<response_time>OK in (.*)ms)"}
  }

  grok {
    match => { "message" => "(?<status_code>Completed (\d+))"}
  }

  if [message] =~ /search.json/ {
    aggregate {
      task_id => "%{trace_id}"
      code => "event['search_request'] = event['message']; event['query_type'] = 'search'"
      map_action => "create"
    }
   }

      if [message] =~ /Headers: {/ {
        aggregate {
          task_id => "%{trace_id}"
          code => "event['headers'] = event['message']"
          map_action => "update"
        }
     }

  if [message] =~ /INFO Completed/ {
    aggregate {
      task_id => "%{trace_id}"
      code => "event['status_code'] = %{status_code}; event['response_time'] = %{response_time}"
      map_action => "update"
      end_of_task => true
      timeout => 120
    }
  }

}


output {

  stdout { codec => rubydebug }

  elasticsearch_http {
    host => ["localhost"]
    port => "9200"
    index => "logstash-%{+YYYY.MM.dd}"
  }

}

There are 2 main problems, first, I'm not seeing the aggregated records in elasticsearch. That is, there's no single record that contains the contents of the 3 aggregations.

The 2nd, is that I'm getting a ton of extra documents being stored with messages that are and empty string. I suspect that groking 3x on message is messing things up.

I would appreciate any advice. Thanks! [edit: apologizes for code formatting, it's hard to do in a browser]


(Mark Walkom) #4

When you specify message multiple times like that it'll try to match all 3.
What' you're supposed to do when groking for that is match the entire line, not just part of it.


(system) #5