Module nginx save documents twice with kubernetes autodiscover

Hi,

I am trying to use Kubernetes Autodiscover with filebeat 6.4.0. It is working except my nginx log are stored twice : one in original format, and one after parsing.

For example, for this log line :

10.142.0.4 - [10.142.0.4] - - [09/Jan/2019:08:49:03 +0000] "GET /auth/realms/myProject/account HTTP/1.1" 200 128 "http://localhost:4200/depart/historique" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36" 2374 0.007 c8e27131804084faac6be36e696310e0 [integration-keycloak-service-80] 10.8.1.243:8080 128 0.004 200

I have these two documents stored :

  • one not parsed :
{
  "_index": "ingress-nginx-filebeat-6.4.0-2019.01.09",
  "_type": "doc",
  "_id": "6rzNMWgBBbfCRPXqGw4Y",
  "_version": 1,
  "_score": null,
  "_source": {
    "@timestamp": "2019-01-09T08:49:03.675Z",
    "host": {
      "name": "filebeat-wrrbk"
    },
    "source": "/var/lib/docker/containers/220f40ee96b95830f397af6da967715fc7cfb98f1ed4da103bce186cea560470/220f40ee96b95830f397af6da967715fc7cfb98f1ed4da103bce186cea560470-json.log",
    "stream": "stdout",
    "beat": {
      "name": "filebeat-wrrbk",
      "hostname": "filebeat-wrrbk",
      "version": "6.4.0"
    },
    "prospector": {
      "type": "docker"
    },
    "kubernetes": {
      "labels": {
        "app": "integration-ingress-nginx",
        "pod-template-hash": "2512842583"
      },
      "pod": {
        "name": "nginx-ingress-controller-integration-6956d869d7-9brtz"
      },
      "node": {
        "name": "gke-cluster-1-pool-0-da2236b1-cvkn"
      },
      "container": {
        "name": "nginx-ingress-controller"
      },
      "namespace": "ingress-nginx",
      "replicaset": {
        "name": "nginx-ingress-controller-integration-6956d869d7"
      }
    },
    "offset": 9393024,
    "message": "10.142.0.4 - [10.142.0.4] - - [09/Jan/2019:08:49:03 +0000] \"GET /auth/realms/myProject/account HTTP/1.1\" 200 128 \"http://localhost:4200/depart/historique\" \"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36\" 2374 0.007 c8e27131804084faac6be36e696310e0 [integration-keycloak-service-80] 10.8.1.243:8080 128 0.004 200",
    "input": {
      "type": "docker"
    }
  },
  "fields": {
    "@timestamp": [
      "2019-01-09T08:49:03.675Z"
    ]
  },
  "highlight": {
    "kubernetes.labels.app": [
      "@kibana-highlighted-field@integration@/kibana-highlighted-field@-@kibana-highlighted-field@ingress@/kibana-highlighted-field@-@kibana-highlighted-field@nginx@/kibana-highlighted-field@"
    ]
  },
  "sort": [
    1547023743675
  ]
}
  • one parsed :
{
  "_index": "ingress-nginx-filebeat-6.4.0-2019.01.09",
  "_type": "doc",
  "_id": "2rzNMWgBBbfCRPXqEw4q",
  "_version": 1,
  "_score": null,
  "_source": {
    "kubernetes": {
      "container": {
        "name": "nginx-ingress-controller"
      },
      "node": {
        "name": "gke-cluster-1-pool-0-da2236b1-cvkn"
      },
      "pod": {
        "name": "nginx-ingress-controller-integration-6956d869d7-9brtz"
      },
      "namespace": "ingress-nginx",
      "replicaset": {
        "name": "nginx-ingress-controller-integration-6956d869d7"
      },
      "labels": {
        "app": "integration-ingress-nginx",
        "pod-template-hash": "2512842583"
      }
    },
    "offset": 9380999,
    "nginx": {
      "access": {
        "response_code": "200",
        "method": "GET",
        "user_name": "[10.142.0.4] - -",
        "http_version": "1.1",
        "remote_ip_list": [
          "10.142.0.4"
        ],
        "url": "/auth/realms/myProject/account",
        "referrer": "http://localhost:4200/depart/historique",
        "request_time": "0.007",
        "remote_ip": "10.142.0.4",
        "request_length": "2374",
        "body_sent": {
          "bytes": "128"
        },
        "request_id": "c8e27131804084faac6be36e696310e0",
        "user_agent": {
          "patch": "3578",
          "major": "71",
          "minor": "0",
          "os": "Linux",
          "name": "Chrome",
          "os_name": "Linux",
          "device": "Other"
        }
      }
    },
    "prospector": {
      "type": "docker"
    },
    "read_timestamp": "2019-01-09T08:49:03.675Z",
    "source": "/var/lib/docker/containers/220f40ee96b95830f397af6da967715fc7cfb98f1ed4da103bce186cea560470/220f40ee96b95830f397af6da967715fc7cfb98f1ed4da103bce186cea560470-json.log",
    "fileset": {
      "module": "nginx",
      "name": "access"
    },
    "input": {
      "type": "docker"
    },
    "@timestamp": "2019-01-09T08:49:03.000Z",
    "stream": "stdout",
    "beat": {
      "hostname": "filebeat-wrrbk",
      "name": "filebeat-wrrbk",
      "version": "6.4.0"
    },
    "host": {
      "name": "filebeat-wrrbk"
    }
  },
  "fields": {
    "@timestamp": [
      "2019-01-09T08:49:03.000Z"
    ],
    "read_timestamp": [
      "2019-01-09T08:49:03.675Z"
    ]
  },
  "highlight": {
    "kubernetes.labels.app": [
      "@kibana-highlighted-field@integration@/kibana-highlighted-field@-@kibana-highlighted-field@ingress@/kibana-highlighted-field@-@kibana-highlighted-field@nginx@/kibana-highlighted-field@"
    ]
  },
  "sort": [
    1547023743000
  ]
}

How can I avoid that ?

My kubernetes configuration is here : https://gist.github.com/olivierboudet/b0bffbdd9148709934dfb51f13b777c6

PS : I tried to update to filebeat 6.5.4 but I have "parsing CRI timestamp" errors so I downgraded...

Hi @orgoz,

I'm checking this, is the content of filebeat-prospectors empty or did you redact it?

Best regards

Hi @exekias

Yes, the filebeat-prospectors is empty, I am using only autodiscover.

I am testing 7.0.0-alpha2 and I discovered a behavior similar without the nginx module.

My configuration with filebeat 7.0.0-alpha2 is the following (I removed the template for nginx because of this issue : https://github.com/elastic/beats/issues/9768) .

autodiscover:
        providers:
          - type: kubernetes
            templates:
              - condition:
                  equals:
                    kubernetes.labels.type: java
                config:
                  - type: docker
                    containers.ids:
                      - "${data.kubernetes.container.id}"
                    multiline.pattern: '^\d{4}-\d{2}-\d{2} ' 
                    multiline.negate: true 
                    multiline.match: after
              - config:
                  - type: docker
                    containers.ids:
                      - "${data.kubernetes.container.id}"
                    processors: 
                      - add_kubernetes_metadata: 
                          in_cluster: true 

With the second block (default config with no condition), my logs for java application are saved twice (I assume one because the condition kubernetes.labels.type: java matched, and the second because of the default config).
If I remove the default config block, the behavior seems normal with only one document inserted in elastic.

Is this a bug or the expected behavior ?

Thanks

Hi,

I observe a similar behavior with filebeat 6.6.0.

With the below configuration, I observe that documents are stored twice :

    filebeat:
      config:
        modules:
          path: ${path.config}/modules.d/*.yml
      processors: 
        - add_kubernetes_metadata: 
            in_cluster: true 
      autodiscover:
        providers:
          - type: kubernetes
            templates:
              - condition:
                  equals:
                    kubernetes.labels.type: java
                config:
                  - type: docker
                    containers.ids:
                      - "${data.kubernetes.container.id}"
                    multiline.pattern: '^[0-9]{4}-[0-9]{2}-[0-9]{2} ' 
                    multiline.negate: true 
                    multiline.match: after
              - condition:
                  contains:
                    kubernetes.labels.app: nginx
                config:
                  - module: nginx
                    access:
                      input:
                        type: docker
                        containers.stream: stdout
                        containers.ids:
                          - "${data.kubernetes.container.id}"
                    error:
                      input:
                        type: docker
                        containers.stream: stderr
                        containers.ids:
                          - "${data.kubernetes.container.id}"
              - config:
                  - type: docker
                    containers.ids:
                      - "${data.kubernetes.container.id}"
    setup:
      template:
        name: "filebeat"
        pattern: "filebeat-*"
    
    output.elasticsearch:
      hosts: ['${ELASTICSEARCH_HOST:elasticsearch}:${ELASTICSEARCH_PORT:9200}']  
      index: "%{[kubernetes.namespace]:nonamespace}-filebeat-%{+yyyy.MM.dd}"
      pipelines:
        - pipeline: java-logs-pipeline
          when.equals:
            kubernetes.labels.type: java
        - pipeline: mongodb-logs-pipeline
          when.equals:
            kubernetes.labels.app: mongo-pod

If I replace the default template (with no condition) with a new condition, the issue is resolved, like this :

    filebeat:
      config:
        modules:
          path: ${path.config}/modules.d/*.yml
      processors: 
        - add_kubernetes_metadata: 
            in_cluster: true 
      autodiscover:
        providers:
          - type: kubernetes
            templates:
              - condition:
                  equals:
                    kubernetes.labels.type: java
                config:
                  - type: docker
                    containers.ids:
                      - "${data.kubernetes.container.id}"
                    multiline.pattern: '^[0-9]{4}-[0-9]{2}-[0-9]{2} ' 
                    multiline.negate: true 
                    multiline.match: after
              - condition:
                  contains:
                    kubernetes.labels.app: nginx
                config:
                  - module: nginx
                    access:
                      input:
                        type: docker
                        containers.stream: stdout
                        containers.ids:
                          - "${data.kubernetes.container.id}"
                    error:
                      input:
                        type: docker
                        containers.stream: stderr
                        containers.ids:
                          - "${data.kubernetes.container.id}"
              - condition:
                  and:
                    - not.equals:
                        kubernetes.labels.type: java
                    - not.contains:
                        kubernetes.labels.app: nginx
                config:
                  - type: docker
                    containers.ids:
                      - "${data.kubernetes.container.id}"

It looks like the default template is always applied, even if a preceding condition has matched.

Thanks

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.