Logstash is Parsing Nginx log only half of it

I have the following log comming from NGINX as reverse proxy set up.

Message:

192.168.1.24 - - [19/Apr/2020:15:39:03 +0200] "GET /website/static/src/scss/options/colors/website.assets_wysiwyg/user_theme_color_palette.scss.css HTTP/1.0" 304 0 "https://maindomain.ch/impressum" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36"

The last part (bold) is not parsed into fields. There is one more problem. The referrer field ist including the " " so I have this "https://maindomain.ch/impressum" in the field. How can i fix that?

This is the logstash filter:

if [host][name] == "SVGXXX-XXXX-01.maindomain.ch" {
  if [event][module] == "nginx" {
    if [fileset][name] == "access" {
       mutate {
       add_tag => ["anginx", "Anginx"]
      }
      if "anginx" in [tags] {
      grok {
        match => **{ "message" => "%{COMBINEDAPACHELOG}+%{(?:"(?:%{URI:referrer}|-)"|%{QS:referrer})}+%{GREEDYDATA:extra_fields}" }**
       # remove_field => "message"
      }
      mutate {
        add_field => { "read_timestamp" => "%{@timestamp}" }
      }
      date {
        match => [ "[nginx][access][time]", "dd/MMM/YYYY:H:m:s Z" ]
        remove_field => "[nginx][access][time]"
      }
      useragent {
        source => "[nginx][access][agent]"
        target => "[nginx][access][user_agent]"
        remove_field => "[nginx][access][agent]"
      }
      geoip {
        source => "[nginx][access][remote_ip]"
        target => "[nginx][access][geoip]"
       }
      }
     }
    }

Why not just use HTTPD_COMBINEDLOG, which includes quoted strings for agent and referrer? QS always includes the quotes in the match, so you would need to define your own patterns to avoid this.

When I use HTTPD_COMBINEDLOG it doesn't change anything. I tried different variants now.

Ok, can you help with creating my own patterns?

HTTPD_COMBINEDLOG is

HTTPD_COMBINEDLOG %{HTTPD_COMMONLOG} %{QS:referrer} %{QS:agent}

Just remove the

+%{(?:"(?:%{URI:referrer}|-)"|%{QS:referrer})}+%{GREEDYDATA:extra_fields}

from your pattern. QS is rather complicated

QUOTEDSTRING (?>(?<!\\)(?>"(?>\\.|[^\\"]+)+"|""|(?>'(?>\\.|[^\\']+)+')|''|(?>`(?>\\.|[^\\`]+)+`)|``))

It handles both single and double quotes, there is a negative lookahead to avoid matching an opening double quote that is escaped, and there are several atomic matches that may or may not be performance optimizations. You could change it to avoid including the quotes in result (by moving the two outermost parentheses into each of the four alternated patterns) but it would be way easier to use

mutate { gsub => [ "referrer", '^"', '', "referrer", '"$', '' ] }

When I have it like this:

if [event][module] == "nginx" {
if [fileset][name] == "access" {
mutate {
add_tag => ["anginx", "Anginx"]
}
if "anginx" in [tags] {
grok {
match => { "message" => "%{HTTPD_COMBINEDLOG}+%{HTTPD_COMMONLOG}+%{QS:referrer}+%{QS:agent}" }
# remove_field => "message"
}
mutate {
add_field => { "read_timestamp" => "%{@timestamp}" }
}

No field ar matched in Kibana. The mutate I would ad like this?

if [event][module] == "nginx" {
if [fileset][name] == "access" {
mutate {
add_tag => ["anginx", "Anginx"]
}
if "anginx" in [tags] {
grok {
match => { "message" => "%{HTTPD_COMBINEDLOG}+%{HTTPD_COMMONLOG}+%{QS:referrer}+%{QS:agent}" }
# remove_field => "message"
}
mutate {
gsub => [ "referrer", '^"', '', "referrer", '"$', '' ]
}
mutate {
add_field => { "read_timestamp" => "%{@timestamp}" }
}

Your pattern requires that the message field contains one or more HTTPD_COMBINEDLOG followed by one of more HTTPD_COMMONLOG followed by one or more quoted strings followed by one or more quoted strings. I would not expect that to match the example message you quoted.

I have now the following patterns. The referrer field is now without " " this works fine, thank you.
But %{QS:referrer} %{QS:agent} doesn't work.

When I just use: HTTPD_COMBINEDLOG %{HTTPD_COMMONLOG} %{QS:referrer} %{QS:agent} nothing is parsed.

if [event][module] == "nginx" {
if [fileset][name] == "access" {
mutate {
add_tag => ["anginx", "Anginx"]
}
if "anginx" in [tags] {
grok {
match => { "message" => '%{HTTPD_COMMONLOG} %{QS:referrer} %{QS:agent}' }
# remove_field => "message"
}
mutate {
gsub => [ "referrer", '^"', '', "referrer", '"$', '' ]
}
mutate {
add_field => { "read_timestamp" => "%{@timestamp}" }
}

i'm sorry I'm still a newbie concerning grok patterns. I have to see it how it works and then I study it and hope I understand it.

What do you mean by that?

The message between the arrows is not splitted up into fields.

"referrer" is fine, sorry this was my mistake.

If you expect the agent field to be parsed you will need to add a useragent filter.

This is probably the solution, that I first need to install a useragent filter.

I don't have this right now. Which one do I need?

Is this version working in logstash 7.5.X and 7.6.X?

I have centos 8 installed, this shouldn't be a problem?

It ships with the default logstash package, so you do not need to install it.

Hi Badger ,

Ok I think I understand it maybe (50%). So yes I have already useragent in my patterns. Probably I just need to modify those lines?

I'm exactly at the same point: Want a Logstash Filter to Parse the Agent Field in Apache Access Log Files

if [host][name] == "SVGXXX-XXXXX-XX.maindomain.ch" {
  if [event][module] == "nginx" {
    if [fileset][name] == "access" {
       mutate {
       add_tag => ["anginx", "Anginx"]
      }
      if "anginx" in [tags] {
      grok {
        match => { "message" => "%{HTTPD_COMMONLOG} %{QS:referrer} %{QS:agent}" }
       # remove_field => "message"
      }
      mutate {
      gsub => [ "referrer", '^"', '', "referrer", '"$', '' ]
      }
      mutate {
        add_field => { "read_timestamp" => "%{@timestamp}" }
      }
      date {
        match => [ "[nginx][access][time]", "dd/MMM/YYYY:H:m:s Z" ]
        remove_field => "[nginx][access][time]"
      }
      useragent {
        source => "[nginx][access][agent]"
        target => "[nginx][access][user_agent]"
        remove_field => "[nginx][access][agent]"
      }
      geoip {
        source => "[nginx][access][remote_ip]"
        target => "[nginx][access][geoip]"
       }
      }
     }
    }
    if [fileset][name] == "error" {
      grok {
        match => { "message" => ["%{DATA:[nginx][error][time]} \[%{DATA:[nginx][error][level]}\] %{NUMBER:[nginx][error][pid]}#%{NUMBER:[nginx][error][tid]}: (\*%
{NUMBER:[nginx][error][connection_id]} )?%{GREEDYDATA:[nginx][error][message]}"] }
        remove_field => "message"
      }
      mutate {
        rename => { "@timestamp" => "read_timestamp" }
      }
      date {
        match => [ "[nginx][error][time]", "YYYY/MM/dd H:m:s" ]
        remove_field => "[nginx][error][time]"
      }
    }
}

I recieve this:

[2020-04-20T20:19:06,289][ERROR][logstash.filters.useragent][main] Uknown error while parsing user agent data {:exception=>#<TypeError: cannot convert instance of class org.jruby.RubyHash to class java.lang.String>, :field=>"[agent]", :event=>#<LogStash::Event:0x8da38e>}

I also found this:

At the moment, I didn't found any working solution.

Now i'm a little bit more at the goal with this config:

  if [event][module] == "nginx" {
    if [fileset][name] == "access" {
       mutate {
       add_tag => ["anginx", "Anginx"]
      }
      if "anginx" in [tags] {
      grok {
        match => { "message" => "%{HTTPD_COMMONLOG} %{QS:referrer} %{QS:user_agent}" }
       # remove_field => "message"
      }
      mutate {
      gsub => [ "referrer", '^"', '', "referrer", '"$', '' ]
      }
      mutate {
        add_field => { "read_timestamp" => "%{@timestamp}" }
      }
      date {
        match => [ "[nginx][access][time]", "dd/MMM/YYYY:H:m:s Z" ]
        remove_field => "[nginx][access][time]"
      }
      useragent {
        source => "[nginx][access][user_agent]"
        target => "[nginx][access][user_agent]"
        remove_field => "[nginx][access][user_agent]"
      }
      geoip {
        source => "[nginx][access][remote_ip]"
        target => "[nginx][access][geoip]"
       }
      }
     }

I have now this in one field called: user agent:
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36"

Now I want to split that up. Of course I will have different information inside and not always Windows NT 10.0 or Chrome.

When I use HTTPD_COMBINEDLOG it doesn't change anything. I tried different variants now.

@sunnywilson09 what do you mean?
This has nothing to do with my problem.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.