Dealing with repeated upstream fields in a Nginx log

Simon_Oxwell · June 30, 2017, 1:10am

Hi,

I'm trying to parse Nginx logs where Nginx is acting as a front-side proxy to multiple upstream servers in such a way that a failure for one upstream server to respond will cause Nginx to try another server. As such, Nginx logs multiple upstream servers and multiple upstream response times.

So an access log entry basically looks like this:

10.20.30.40 - - [30/Jun/2017:09:34:28 +1000] http test.com 172.0.0.80:8080, 172.0.0.40:8080 0.000, 15.053 "GET /uri HTTP/1.1" 200 3960 "http://test.com/uri" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:54.0) Gecko/20100101 Firefox/54.0" "-"

Which we've been parsing with a grok filter like this:

  grok {
    match => { "message" => "%{IPORHOST:clientip} - %{USER:ident} \[%{HTTPDATE:timestamp}\] (%{NUMBER}|%{NOTSPACE:scheme}) %{HOSTNAME:httphost} (%{HOSTNAME:upstreamip}:%{NUMBER}|-), (%{HOSTNAME:upstreamip}:%{NUMBER}|-) (%{NUMBER:upstream_response_time:float}|-), (%{NUMBER:upstream_response_time:float}|-) \"(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})\" %{NUMBER:response} (?:%{NUMBER:bytes:int}|-) %{QS:referrer} %{QS:agent} %{QS:x_forwarded_for}" }
    add_field => { "upstream_failed" => "true" }
 }

This produces multiple upstreamip fields, and multiple upstream_response_time fields.

Currently, in kibana, the multiple upstreamip fields are shown as a comma separated list (eg, '172.0.0.80, 172.0.0.40', or as array in JSON view), but the upstream_response_time field shows only the first value, or possibly no value, showing only '0'.

So, not quite understanding why upstream_response_time isn't showing as a comma separated list/array.

Alternatively, is there a better way of handling this sort of log entry? Ultimately, I need to be able to access the sum of the upstream_response_time values, potentially via some aggregator in elasticsearch (rather than a raw entry?), whilst also preserving the original list of times.

Any suggestions are welcomed.

Running Logstash 5.4.0, Elasticsearch 5.4.0, Kibana 5.4.0, and the nginx files are shipped to logstash with filebeat 5.4.3.

Thanks,

Simon

Simon_Oxwell · July 25, 2017, 1:53am

Eventually dealt with this by doing this:

  grok {
    match => { "message" => "%{IPORHOST:clientip} - %{HTTPDUSER:ident} \[%{HTTPDATE:timestamp}\] (%{NUMBER}|%{NOTSPACE:scheme}) (%{HOSTNAME:httphost}|_) (?<upstreamip>%{HOSTNAME}(:%{NUMBER})?(, %{HOSTNAME}(:%{NUMBER})?)*) (?<upstream_response_times>(%{NUMBER}|-)(, (%{NUMBER}|-))*) \"(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})\" %{NUMBER:response} (?:%{NUMBER:bytes:int}|-) %{QS:referrer} %{QS:agent} %{QS:x_forwarded_for}( %{NUMBER:request_time:float})?" }
  }
  # don't try anything else if we still can't parse
  if !("_grokparsefailure" in [tags]) {
    # remove ':<portnum>' from the string of upstream servers
    # (Two mutate filters to preserve order)
    mutate {
      gsub => [ "upstreamip", ":\d*", "" ]
    }
    # convert the comma separated string to an array
    mutate {
      split => ["upstreamip", ", "]
    }
    # sum the upstream response times and remove the raw field
    ruby {
      code => "event.set('upstream_response_time', event.get('upstream_response_times').gsub(/-/, '0').split(', ').map(&:to_f).inject(:+))"
      remove_field => [ 'upstream_response_times' ]
    }
  }

I don't know if that's the most efficient, but it works well enough.

system · August 22, 2017, 1:54am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Nginx-ingress grok expression does not handle multiple upstreams Beats filebeat	3	515	September 24, 2020
Please help with grok pattern Logstash	8	1945	April 11, 2018
Grokparsefailure with nginx logs Logstash	2	642	May 30, 2020
Как распарсить лог нжинкса в случае хождения по апстримам Вопросы на русском языке	4	565	December 5, 2018
Grok patterns for nginx Elasticsearch	2	178	October 10, 2023

Dealing with repeated upstream fields in a Nginx log

Related topics