Logstash cannot parse user_agent field of nginx

I have nginx logs with the following format:

192.168.0.1 - - [18/Jul/2022:11:20:28 +0000] "GET / HTTP/1.1" 200 15 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36" "-"
192.168.128.1 - - [18/Jul/2022:13:22:15 +0000] "GET / HTTP/1.1" 200 615 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36" "-"

I am using the following pipeline to parse them and store them into elasticsearch:

input {
    beats {
        port => 5044
    }
}

filter {
    grok {
        match => [ "message" , "%{COMBINEDAPACHELOG}+%{GREEDYDATA:extra_fields}"]
    }
    mutate {
        convert => ["response", "integer"]
        convert => ["bytes", "integer"]
        convert => ["responsetime", "float"]
    }
    geoip {
        source => "clientip"
        target => "geoip"
        add_tag => [ "nginx-geoip" ]
    }
    date {
        match => [ "timestamp" , "dd/MMM/YYYY:HH:mm:ss Z" ]
    }
    useragent {
        source => "agent"
    }
}

output {
    elasticsearch {
        hosts => ["http://elasticsearch:9200"]
        index => "weblogs-%{+YYYY.MM.dd}"
        document_type => "nginx_logs"
        user => "elastic"
        password => "changeme"
    }
    stdout { codec => rubydebug }
}

However, it seems that the part of useragent does not work, since I cannot see it:

{

    "httpversion" => "1.1",

       "clientip" => "192.168.0.1",

          "ident" => "-",

      "timestamp" => "18/Jul/2022:11:20:28 +0000",

           "verb" => "GET",

     "@timestamp" => 2022-07-18T11:20:28.000Z,

       "@version" => "1",

           "tags" => [

        [0] "beats_input_codec_plain_applied",

        [1] "_geoip_lookup_failure"

    ],

           "host" => {

        "name" => "9a852bd136fd"

    },

           "auth" => "-",

          "bytes" => 15,

       "referrer" => "\"-\"",

          "geoip" => {},

        "message" => "192.168.0.1 - - [18/Jul/2022:11:20:28 +0000] \"GET / HTTP/1.1\" 200 15 \"-\" \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36\" \"-\"",

       "response" => 200,

          "agent" => {

             "version" => "7.3.2",

        "ephemeral_id" => "0c38336d-1e30-4aaa-9ba8-20bd7bd8fb48",

                "type" => "filebeat",

            "hostname" => "9a852bd136fd",

                  "id" => "8991142a-95df-4aed-a190-bda4649c04cd"

    },

          "input" => {

        "type" => "log"

    },

        "request" => "/",

     "extra_fields" => " \"-\"",

            "log" => {

          "file" => {

            "path" => "/var/log/nginx/access.log"

        },

        "offset" => 11021

    },

            "ecs" => {

        "version" => "1.0.1"

    }

}

What I need is to have a field including the whole http_user_agent content. Any idea of what is causing the error?

I think this is an ECS compatibility issue. When I run your pipeline I get a field called [user_agent], not [agent]. If I add ecs_compatibility => "disabled" to the grok filter then I do not get the user agent in the [agent] field because filebeat has created an [agent] object with details of the beat. I also get errors

:exception=>TypeError, :message=>"cannot convert instance of class org.jruby.RubyHash to class java.lang.String"

You can change ecs_compatibility at the filter, pipeline, or instance level.

First of all, I did not get if I should use ecs_compatibility => "disabled" and at which part?

But even if I will be able to parse it, I do not want to have different fields, like the following:

    {
        "name"=>"Firefox",
        "version"=>"45.0", # since plugin version 3.3.0
        "major"=>"45",
        "minor"=>"0",
        "os_name"=>"Mac OS X",
        "os_version"=>"10.11", # since plugin version 3.3.0
        "os_full"=>"Mac OS X 10.11",
        "os_major"=>"10",
        "os_minor"=>"11",
        "device"=>"Mac"
    }

but one field to include the whole http_user_agent, like this one:

user_agent  => "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36"

That is what I get by default. You need to look at the setting of ecs_compatibility at the instance, pipeline, and filter levels, and read your logstash logs which will have messages related to it.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.