Duplicate entries


(Nicolas Grandchamp) #1

Hi.
I was using ELK running on a Windows environment, but i had changed it to Linux running on Docker.
On Windows, it was running fine without duplicating entries collected from filebeat.

So far, my scenario is:

1 - Two apps producing txt log files on Windows, one txt file for each day (yyyy-mm-dd.log and debug-yyyy-mm-dd.log)

2 - The logs folder from each app is network shared for Linux be able to read the logs

3 - Mapped via docker-compose volumes (/media is mapped to network shared windows folder):

  filebeat:
    restart: always
    build: filebeat/
    volumes:
      - /media/logs1:/logs-ws
      - /media/logs2:/logs-site
    networks:
      - elk
    depends_on:
      - logstash

3 - ELK stack running on top of Docker with docker-compose.

4 - Logstash config file:

http.host: "0.0.0.0"
path.config: /usr/share/logstash/pipeline
xpack.monitoring.enabled: false

5 - Logstash pipeline config file:

input {
	beats {
		port => 5044
	}
}

output {
	elasticsearch {
		hosts => "elasticsearch:9200"
	}
}

6 - Filebeat yml config file:

filebeat.prospectors:
- input_type: log
  paths:
    - /logs-ws/*
    - /logs-site/*
  multiline:
    pattern: '^(\d{4}-((0[1-9]|1[012])-(0[1-9]|1\d|2[0-8])|(0[13456789]|1[012])-(29|30)|(0[13578]|1[02])-31)|(\d{2}[02468][048]|[13579][26])-02-29) (0[0-9]|1[0-9]|2[0-4]):(60|[0-5][0-9]):(60|[0-5][0-9]).(\d{4})'
    match: after
    negate: true
output.logstash:
  hosts: ["logstash:5044"]

On Kibana, i'm seeing duplicate entries (the txt entry is not duplicated, so filebeat is duplicating it):

{
  "_index": "logstash-2017.08.18",
  "_type": "log",
  "_id": "AV32pIUG89VkBbf3-bsa",
  "_score": 1,
  "_source": {
    "@timestamp": "2017-08-18T18:39:18.287Z",
    "offset": 335901,
    "@version": "1",
    "input_type": "log",
    "beat": {
      "hostname": "cfc086802ee0",
      "name": "cfc086802ee0",
      "version": "5.3.0"
    },
    "host": "cfc086802ee0",
    "source": "/logs-ws/debug-2017-08-18.log",
    "message": "2017-08-18 15:30:39.0286 DEBUG XXX: YYY: ZZZ, CPF:  // HTTP Status: OK\n\nRetorno: ",
    "type": "log",
    "tags": [
      "beats_input_codec_plain_applied"
    ]
  },
  "fields": {
    "@timestamp": [
      1503081558287
    ]
  }
}

{
  "_index": "logstash-2017.08.18",
  "_type": "log",
  "_id": "AV32o5nG89VkBbf3-bC3",
  "_score": 1,
  "_source": {
    "@timestamp": "2017-08-18T18:38:18.058Z",
    "offset": 335901,
    "@version": "1",
    "input_type": "log",
    "beat": {
      "hostname": "cfc086802ee0",
      "name": "cfc086802ee0",
      "version": "5.3.0"
    },
    "host": "cfc086802ee0",
    "source": "/logs-ws/debug-2017-08-18.log",
    "message": "2017-08-18 15:30:39.0286 DEBUG XXX: YYY: ZZZ, CPF:  // HTTP Status: OK\n\nRetorno: ",
    "type": "log",
    "tags": [
      "beats_input_codec_plain_applied"
    ]
  },
  "fields": {
    "@timestamp": [
      1503081498058
    ]
  }
}

Logstash logs: https://gist.github.com/grandchamp/cf888e8108ed8f99b8ae0e56f48f4dbd
Filebeat logs: https://gist.github.com/grandchamp/22f3c0d8573d8b87259d49430a285d41

I don't have any clue on how solve this problem.


(Andrew Kroh) #2

I'm pretty sure it's the network share causing issues. See Can’t read log files from network volumes?.

If you enable debug logging from Filebeat you probably see that it's re-reading the log file because it thinks its a new file. Filebeat uses the inodes to uniquely identify files and these values are unreliable on some network file systems.


(Nicolas Grandchamp) #3

Any Idea on How to solve this?
Probably, configure just filebeat on windows sending to logstash on Linux?


(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.