How to use nginx's host overwrite logstash's host information when send it by filebeat?

Now using filebeat and logstash sending nginx's json log on k8s.

The nginx's configuration likes

nginx.conf

http {
    log_format bucket escape=json
    '{'
        '"request_id": "$request_id",'
        '"method": "$request_method",'
        '"status": "$status",'
        '"forwarded_for": "$http_x_forwarded_for",'
        '"host": "$host",'
        '"url": "$request_uri",'
        '"referer": "$http_referer",'
        '"remote_ip": "$remote_addr",'
        '"server_ip": "$server_addr",'
        '"user_agent": "$http_user_agent",'
    '}';
}

server {
    access_log  /var/log/nginx/access.json  bucket;
}

Filebeat's configuration:

filebeat.yml

filebeat.shutdown_timeout: 5s

filebeat.inputs:
  - type: log
    enabled: true
    paths:
      - /var/log/nginx/access.json*
    exclude_files: ['\.gz$']
    tags: ["access"]

processors:
  - decode_json_fields:
      fields: ["message"]
      process_array: true
      max_depth: 1
      target: ""
      overwrite_keys: true
      add_error_key: false

output.logstash:
  hosts: ["logstash.default.svc.cluster.local:5044"]

Here overwirte_keys is true so it should overwrite metadata, right?

Logstash's configuration:

logstash.conf

input {
  beats {
    port => 5044
  }
}

filter {
  if "access" in [tags] {
    mutate {
      add_field => { "[@metadata][tags]" => "%{tags}" }
      remove_field => [
        "agent",
        "event",
        "service",
        "log",
        "input",
        "fileset",
        "ecs",
        "container",
        "kubernetes",
        "@timestamp",
        "@version",
        "message",
        "tags"
      ]
    }
  }
}

output {
  if "access" in [@metadata][tags] {
    google_cloud_storage {
      bucket => "nginx_logs"
      json_key_file => "/secrets/service_account/credentials.json"
      temp_directory => "/tmp/nginx_logs"
      log_file_prefix => "logstash_nginx_logs"
      max_file_size_kbytes => 1024
      output_format => "json"
      date_pattern => "%Y-%m-%dT%H:00"
      flush_interval_secs => 2
      gzip => false
      gzip_content_encoding => false
      uploader_interval_secs => 60
      include_uuid => true
      include_hostname => true
    }
  }
}

It works well at the beginning. The log data has been generated to json files as:

{"user_agent":"Mozilla/5.0 (iPhone; CPU iPhone OS 14_2_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/15E148 [FBAN/FBIOS;FBDV/iPhone13,1;FBMD/iPhone;FBSN/iOS;FBSV/14.2.1;FBSS/3;FBID/phone;FBLC/ja_JP;FBOP/5]","forwarded_for":"1.2.3.4","host":"api.mysite.com","method":"OPTIONS","request_id":"0127054b954fe4973852e1886130a6ca","referer":"https://www.world.com/","remote_ip":"2.3.4.5","server_ip":"3.4.5.6","status":"204","url":"/api/v1/post"}

But recently, this data occurred:

{"user_agent":"Mozilla/5.0 (iPhone; CPU iPhone OS 14_2_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/15E148 [FBAN/FBIOS;FBDV/iPhone13,1;FBMD/iPhone;FBSN/iOS;FBSV/14.2.1;FBSS/3;FBID/phone;FBLC/ja_JP;FBOP/5]","forwarded_for":"1.2.3.4","host":"api.mysite.com","method":"OPTIONS","request_id":"0127054b954fe4973852e1886130a6ca","referer":"https://www.world.com/","remote_ip":"2.3.4.5","server_ip":"3.4.5.6","status":"204","url":"/api/v1/post"}
{"host":{"name":"filebeat-adio3"}}
{"host":{"name":"filebeat-adio3"}}
{"host":{"name":"filebeat-adio3"}}

This is not a regular data. It looks like filebeat server's host metadata has been sent. But why? Is it a filebeat's mistake or logstash's?
Is there an another good way to filter this host data to ensure to be sent without conflict with fb/logstash's metadata?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.