Duplicate entries

Hi.
I was using ELK running on a Windows environment, but i had changed it to Linux running on Docker.
On Windows, it was running fine without duplicating entries collected from filebeat.

So far, my scenario is:

1 - Two apps producing txt log files on Windows, one txt file for each day (yyyy-mm-dd.log and debug-yyyy-mm-dd.log)

2 - The logs folder from each app is network shared for Linux be able to read the logs

3 - Mapped via docker-compose volumes (/media is mapped to network shared windows folder):

  filebeat:
    restart: always
    build: filebeat/
    volumes:
      - /media/logs1:/logs-ws
      - /media/logs2:/logs-site
    networks:
      - elk
    depends_on:
      - logstash

3 - ELK stack running on top of Docker with docker-compose.

4 - Logstash config file:

http.host: "0.0.0.0"
path.config: /usr/share/logstash/pipeline
xpack.monitoring.enabled: false

5 - Logstash pipeline config file:

input {
	beats {
		port => 5044
	}
}

output {
	elasticsearch {
		hosts => "elasticsearch:9200"
	}
}

6 - Filebeat yml config file:

filebeat.prospectors:
- input_type: log
  paths:
    - /logs-ws/*
    - /logs-site/*
  multiline:
    pattern: '^(\d{4}-((0[1-9]|1[012])-(0[1-9]|1\d|2[0-8])|(0[13456789]|1[012])-(29|30)|(0[13578]|1[02])-31)|(\d{2}[02468][048]|[13579][26])-02-29) (0[0-9]|1[0-9]|2[0-4]):(60|[0-5][0-9]):(60|[0-5][0-9]).(\d{4})'
    match: after
    negate: true
output.logstash:
  hosts: ["logstash:5044"]

On Kibana, i'm seeing duplicate entries (the txt entry is not duplicated, so filebeat is duplicating it):

{
  "_index": "logstash-2017.08.18",
  "_type": "log",
  "_id": "AV32pIUG89VkBbf3-bsa",
  "_score": 1,
  "_source": {
    "@timestamp": "2017-08-18T18:39:18.287Z",
    "offset": 335901,
    "@version": "1",
    "input_type": "log",
    "beat": {
      "hostname": "cfc086802ee0",
      "name": "cfc086802ee0",
      "version": "5.3.0"
    },
    "host": "cfc086802ee0",
    "source": "/logs-ws/debug-2017-08-18.log",
    "message": "2017-08-18 15:30:39.0286 DEBUG XXX: YYY: ZZZ, CPF:  // HTTP Status: OK\n\nRetorno: ",
    "type": "log",
    "tags": [
      "beats_input_codec_plain_applied"
    ]
  },
  "fields": {
    "@timestamp": [
      1503081558287
    ]
  }
}

{
  "_index": "logstash-2017.08.18",
  "_type": "log",
  "_id": "AV32o5nG89VkBbf3-bC3",
  "_score": 1,
  "_source": {
    "@timestamp": "2017-08-18T18:38:18.058Z",
    "offset": 335901,
    "@version": "1",
    "input_type": "log",
    "beat": {
      "hostname": "cfc086802ee0",
      "name": "cfc086802ee0",
      "version": "5.3.0"
    },
    "host": "cfc086802ee0",
    "source": "/logs-ws/debug-2017-08-18.log",
    "message": "2017-08-18 15:30:39.0286 DEBUG XXX: YYY: ZZZ, CPF:  // HTTP Status: OK\n\nRetorno: ",
    "type": "log",
    "tags": [
      "beats_input_codec_plain_applied"
    ]
  },
  "fields": {
    "@timestamp": [
      1503081498058
    ]
  }
}

Logstash logs: https://gist.github.com/grandchamp/cf888e8108ed8f99b8ae0e56f48f4dbd
Filebeat logs: https://gist.github.com/grandchamp/22f3c0d8573d8b87259d49430a285d41

I don't have any clue on how solve this problem.

I'm pretty sure it's the network share causing issues. See Can’t read log files from network volumes?.

If you enable debug logging from Filebeat you probably see that it's re-reading the log file because it thinks its a new file. Filebeat uses the inodes to uniquely identify files and these values are unreliable on some network file systems.

Any Idea on How to solve this?
Probably, configure just filebeat on windows sending to logstash on Linux?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.