[SOLVED] Filebeat to Logstash best practice

Hi folks,

I'm currently looking over a Filebeat config used to ship Nginx data to Logstash. Filebeat config is:

filebeat:
  prospectors:

    - paths:
        - /var/log/syslog
        - /var/log/auth.log
      document_type: syslog
    - paths:
        - /webapp/logs/nginx_access.log
      document_type: nginx-access
    - paths:
        - /webapp/logs/nginx_error.log
      document_type: nginx-error

output:
  logstash:
    hosts: ["logstash:5010"]

The goal is to (eventually) get all of these different doctypes into their own indices, and to override the timestamp with the HTTP access request timestamp, in the case of Nginx.

I'm wondering: what would be considered best practice for doing this? Would a bunch of filters be able to do so? Any help or advice would be hugely appreciated.

I'm wondering: what would be considered best practice for doing this? Would a bunch of filters be able to do so?

Yes. A set of one grok filter and one date filter per log type is pretty typical. See Logstash configuration examples | Logstash Reference [8.11] | Elastic.

Thanks for the quick reply, Magnus! I appreciate it.

I'm pretty new to ES and I'm still wrapping my head around it, but would the logstash config look something like this for sorting the nginx-access beats from the rest?

input {
  beats {
    port => 5010
    host => "0.0.0.0"
  }
}
filter {
  if [type] == "nginx-access" {
   grok {
      <match rules here>
   }
   mutate {
      rename => { "@timestamp" => "read_timestamp"}
   }
   date {
      match => [ "[nginx][access][time]", "dd/MMM/YYYY:H:m:s Z" ]
      remove_field => "[nginx][access][time]"
      target => "@timestamp"
   }
   useragent {
      source => "[nginx][access][agent]"
      target => "user_agent"
   }
   geoip {
      source => "[nginx][access][remote_ip]"
      target => "geoip"
   }
  }
}
output {
  if [type] == "nginx-access" {
    elasticsearch {
      hosts => ["elasticsearch:9200"]
      manage_template => false
      index => "nginx-access-dedicated-%{+YYYY.MM.dd}"
      document_type => "nginx-access-dedicated"
    }
  }
  else
  {
    elasticsearch {
      hosts => ["elasticsearch:9200"]
    }
  }
}

I'm not clear on how the Logstash filter is able to differentiate between the different document types fed to it by Filebeat, and then how they'd get sent to their own indices. Would the above example be roughly correct? Any pointers would be appreciated!

I'm not clear on how the Logstash filter is able to differentiate between the different document types fed to it by Filebeat, and then how they'd get sent to their own indices.

Well, Filebeat sets the type according to the Filebeat configuration and the conditionals you have on the Logstash side choose which filters to apply to which events.

Would the above example be roughly correct?

That configuration doesn't look too crazy, but as always the devil is in the details.

Ok, I'll give it a shot and see how it goes. Thanks again for the help!

I'm not sure where I'm going wrong. I've spotted two things:

  1. The "message" field is coming across as such:
    [16/Jun/2017:15:52:33 +0000] \"GET /static/admin/img/icon-unknown.svg HTTP/1.1\"
    I'm not sure if those escape characters before the brackets are breaking the grok pattern matching;

  2. Logs are coming through but are being stashed in the Logstash index, with the following fields:
    @timestamp (which isn't what it should be)
    _id
    _index (Logstash, is not what I want)
    _score
    _type (logs - not right)
    host
    message
    source

Am I missing something deeply fundamental and silly here? I hope so, because I'm scratching my head pretty hard right now..

The "message" field is coming across as such:

Where, exactly?

Logs are coming through but are being stashed in the Logstash index, with the following fields:

Please show a complete raw event, preferably via copy/paste from the JSON tab in Kibana's Discover panel.

Am I missing something deeply fundamental and silly here? I hope so, because I'm scratching my head pretty hard right now..

Please show your configuration (both Logstash and Filebeat, format them as preformatted text with the </> toolbar button).

Here's the raw event in JSON format. I note the "nginx-access" type I want is nowhere to be seen, which is peculiar - something's overriding it to be "logs:"

{
  "_index": "logstash-2017.06.17",
  "_type": "logs",
  "_id": "AVy18Z1kfowfuGON7coU",
  "_score": null,
  "_source": {
    "@timestamp": "2017-06-17T12:04:57.382Z",
    "host": "admin-1",
    "source": "/webapp/logs/nginx_access.log",
    "GeoLocation": {},
    "message": "82.9.88.166 - - [17/Jun/2017:12:04:51 +0000] \"GET /static/admin/js/admin/DateTimeShortcuts.js HTTP/1.1\" 200 18563 \"https://admin-d.oscptest.co/admin/auth_user/authuser/1/change/\" \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36\""
  },
  "fields": {
    "@timestamp": [
      1497701097382
    ]
  },
  "sort": [
    1497701097382
  ]
}

The Filebeat config is as such:

filebeat:
  prospectors:

    - paths:
        - /webapp/logs/nginx_access.log
      document_type: nginx-access

output:
  logstash:
    hosts: ["logstash:5010"]

And the Logstash config is this:

input {
  beats {
    codec => "json"
    port => 5010
    host => "0.0.0.0"
  }
}
filter {
  if [type] == "nginx-access" {
    grok {
      match => {
        "message" => '%{IPORHOST:remote_ip} - %{DATA:user_name} \[%{HTTPDATE:time}\] "%{WORD:request_action} %{DATA:request} HTTP/%{NUMBER:http_version}" %{NUMBER:response} %{NUMBER:bytes} "%{DATA:referrer}" "%{DATA:agent}"'
      }
    }

    date {
      match => [ "time", "dd/MMM/YYYY:HH:mm:ss Z" ]
      locale => en
      target => "@timestamp"
    }

    geoip {
      source => "remote_ip"
      target => "geoip"
    }

    useragent {
      source => "agent"
      target => "user_agent"
    }
  }
}

output {
  if [type] == "nginx-access" {
    elasticsearch {
      hosts => ["elasticsearch:9200"]
      manage_template => false
      index => "nginx-access-dedicated-%{+YYYY.MM.dd}"
      document_type => "nginx-access-dedicated"
    }
  }
  else
  {
    elasticsearch {
      hosts => ["elasticsearch:9200"]
    }
  }
}

Thanks, by the way - I appreciate you taking the time to help me with this. If you're ever in my neck of the woods, I owe you several drinks of your choice. :slight_smile:

1 Like

There are a few weird things here.

  • The resulting event lacks a type field. Logstash always adds that field and the configuration you've posted doesn't remove it.
  • It has a GeoLocation field. Where does it come from? No configuration you've posted adds it.

Are you sure you don't have any additional configuration files in /etc/logstash/conf.d?

There's another config file in Logstash that handles Wazuh (v2.0) events but that's running on port 5000, where this is listening on 5010. As the Logstash service is in a container it likely also has the default logstash.conf file for the 5.2.2 container - unsure what the contents of that are but I think it's listening on 5044.

I can't access the system for a few hours but will confirm the Logstash default config file later - the Wazuh instance has been set up as per their V2 docs, using their default Logstash file.

Could it be that the Wazuh or Logstash default config is somehow interfering with the Nginx one despite them all being on different ports?

Some quick checking indicates that this is by design as LS merges all the config files. So as Wazuh's is first, I guess that's processing the events before the Nginx conditional gets a chance to fire. I'll add conditionals to the first config to compensate and see if that fixes it.

My fault - need to RTFM in future.

Success! The following monolithic Logstash config file splits events properly (document_type is set to 'nginx-access' in the test host's Filebeat config):

# Wazuh - Logstash configuration file
## Remote Wazuh Manager - Filebeat input
input {
    beats {
        port => 5000
        codec => "json_lines"
#        ssl => true
#        ssl_certificate => "/etc/logstash/logstash.crt"
#        ssl_key => "/etc/logstash/logstash.key"
    }
}
filter {
    if [type] == "nginx-access" {
    # NGINX access log processing block
      grok {
        match => {
          "message" => '%{IPORHOST:remote_ip} - %{DATA:user_name} \[%{HTTPDATE:time}\] "%{WORD:request_action} %{DATA:request} HTTP/%{NUMBER:http_version}" %{NUMBER:response} %{NUMBER:bytes} "%{DATA:referrer}" "%{DATA:agent}"'
        }
      }

      date {
        match => [ "time", "dd/MMM/YYYY:HH:mm:ss Z" ]
        locale => en
        target => "@timestamp"
      }

      geoip {
        source => "remote_ip"
        target => "geoip"
      }

      useragent {
        source => "agent"
        target => "user_agent"
      }
    }
    else
    # WAZUH processing block
    {
      geoip {
        source => "srcip"
        target => "GeoLocation"
        fields => ["city_name", "continent_code", "country_code2", "country_name", "region_name", "location"]
      }
      date {
        match => ["timestamp", "ISO8601"]
        target => "@timestamp"
      }
      mutate {
        remove_field => [ "timestamp", "beat", "fields", "input_type", "tags", "count", "@version", "log", "offset", "type"]
      }
  }
}
output {
  if [type] == 'nginx-access' {
    elasticsearch {
      hosts => ["elasticsearch:9200"]
      index => "nginx-access-%{+YYYY.MM.dd}"
      document_type => "nginx-access"
      template => "/etc/logstash/nginx_template.json"
      template_name => "nginx-template"
      template_overwrite => true
    }
  }
  else
  {
    elasticsearch {
      hosts => ["elasticsearch:9200"]
      index => "wazuh-alerts-%{+YYYY.MM.dd}"
      document_type => "wazuh"
      template => "/etc/logstash/wazuh-elastic5-template.json"
      template_name => "wazuh"
      template_overwrite => true
    }
  }
}

The generated events are as such:

{
  "_index": "nginx-access-2017.06.19",
  "_type": "nginx-access",
  "_id": "AVy_QFslG0_ypPrxPPao",
  "_score": null,
  "_source": {
    "request": "/admin/",
    "request_action": "GET",
    "agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36",
    "geoip": {
      "timezone": "Europe/London",
      "ip": "XX.XX.XX.XX",
      "latitude": 51.48,
      "continent_code": "EU",
      "city_name": "Cardiff",
      "country_code2": "GB",
      "country_name": "United Kingdom",
      "country_code3": "GB",
      "region_name": "Cardiff",
      "location": [
        -3.18,
        51.48
      ],
      "postal_code": "XXXX",
      "longitude": -3.18,
      "region_code": "CRF"
    },
    "offset": 4050807,
    "user_name": "-",
    "input_type": "log",
    "http_version": "1.1",
    "source": "/webapp/logs/nginx_access.log",
    "message": "XX.XX.XX.XX - - [19/Jun/2017:07:28:01 +0000] \"GET /admin/ HTTP/1.1\" 200 996 \"https://XXXX/admin/auth/group/\" \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36\"",
    "type": "nginx-access",
    "tags": [
      "_jsonparsefailure",
      "beats_input_codec_json_applied"
    ],
    "referrer": "https://XXXX/admin/auth/group/",
    "@timestamp": "2017-06-19T07:28:01.000Z",
    "remote_ip": "XX.XX.XX.XX",
    "response": "200",
    "bytes": "996",
    "@version": "1",
    "beat": {
      "hostname": "XX",
      "name": "XXX",
      "version": "5.4.1"
    },
    "host": "XXX",
    "time": "19/Jun/2017:07:28:01 +0000",
    "user_agent": {
      "patch": "3029",
      "os": "XXXX",
      "major": "58",
      "minor": "0",
      "name": "Chrome",
      "os_name": "XXXX",
      "device": "Other"
    }
  },
  "fields": {
    "@timestamp": [
      1497857281000
    ]
  },
  "sort": [
    1497857281000
  ]
}

In this case I'm getting the template from here: https://github.com/elastic/examples/tree/master/ElasticStack_NGINX - big thanks to you for all of the help, Magnus. I need to clean up the JSON parsing issues creating those tags, but this works!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.