Dealing with repeated upstream fields in a Nginx log

(Simon Oxwell) #1


I'm trying to parse Nginx logs where Nginx is acting as a front-side proxy to multiple upstream servers in such a way that a failure for one upstream server to respond will cause Nginx to try another server. As such, Nginx logs multiple upstream servers and multiple upstream response times.

So an access log entry basically looks like this: - - [30/Jun/2017:09:34:28 +1000] http, 0.000, 15.053 "GET /uri HTTP/1.1" 200 3960 "" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:54.0) Gecko/20100101 Firefox/54.0" "-"

Which we've been parsing with a grok filter like this:

  grok {
    match => { "message" => "%{IPORHOST:clientip} - %{USER:ident} \[%{HTTPDATE:timestamp}\] (%{NUMBER}|%{NOTSPACE:scheme}) %{HOSTNAME:httphost} (%{HOSTNAME:upstreamip}:%{NUMBER}|-), (%{HOSTNAME:upstreamip}:%{NUMBER}|-) (%{NUMBER:upstream_response_time:float}|-), (%{NUMBER:upstream_response_time:float}|-) \"(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})\" %{NUMBER:response} (?:%{NUMBER:bytes:int}|-) %{QS:referrer} %{QS:agent} %{QS:x_forwarded_for}" }
    add_field => { "upstream_failed" => "true" }

This produces multiple upstreamip fields, and multiple upstream_response_time fields.

Currently, in kibana, the multiple upstreamip fields are shown as a comma separated list (eg, ',', or as array in JSON view), but the upstream_response_time field shows only the first value, or possibly no value, showing only '0'.

So, not quite understanding why upstream_response_time isn't showing as a comma separated list/array.

Alternatively, is there a better way of handling this sort of log entry? Ultimately, I need to be able to access the sum of the upstream_response_time values, potentially via some aggregator in elasticsearch (rather than a raw entry?), whilst also preserving the original list of times.

Any suggestions are welcomed.

Running Logstash 5.4.0, Elasticsearch 5.4.0, Kibana 5.4.0, and the nginx files are shipped to logstash with filebeat 5.4.3.



(Simon Oxwell) #2

Eventually dealt with this by doing this:

  grok {
    match => { "message" => "%{IPORHOST:clientip} - %{HTTPDUSER:ident} \[%{HTTPDATE:timestamp}\] (%{NUMBER}|%{NOTSPACE:scheme}) (%{HOSTNAME:httphost}|_) (?<upstreamip>%{HOSTNAME}(:%{NUMBER})?(, %{HOSTNAME}(:%{NUMBER})?)*) (?<upstream_response_times>(%{NUMBER}|-)(, (%{NUMBER}|-))*) \"(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})\" %{NUMBER:response} (?:%{NUMBER:bytes:int}|-) %{QS:referrer} %{QS:agent} %{QS:x_forwarded_for}( %{NUMBER:request_time:float})?" }
  # don't try anything else if we still can't parse
  if !("_grokparsefailure" in [tags]) {
    # remove ':<portnum>' from the string of upstream servers
    # (Two mutate filters to preserve order)
    mutate {
      gsub => [ "upstreamip", ":\d*", "" ]
    # convert the comma separated string to an array
    mutate {
      split => ["upstreamip", ", "]
    # sum the upstream response times and remove the raw field
    ruby {
      code => "event.set('upstream_response_time', event.get('upstream_response_times').gsub(/-/, '0').split(', ').map(&:to_f).inject(:+))"
      remove_field => [ 'upstream_response_times' ]

I don't know if that's the most efficient, but it works well enough.

(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.