Filebeat + elasticsearch make duplicates events

I have a trouble with events dublicates in elasticsearch.
I used filebeat to aggregate events from logback logs. It's config:

filebeat.inputs:

- type: log

  enabled: true

  paths:
    - /#/*/#/*/#/logs/*.log
    - /#/*/#/*/#/logs/catalina.out

  document_type: container

  #exclude_lines: ['^DBG']
  #include_lines: ['^ERR', '^WARN']
  #exclude_files: ['.gz$']

  multiline.pattern: '^\['
  multiline.negate: false
  multiline.match: after
  multiline.max_lines: 10000

  ignore_older: 3m
  close_timeout: 2m
  clean_inactive: 2h
  scan_frequency: 1s

  close_renamed: true
  close_removed: true
  clean_removed: true
  fields_under_root: true
  harvester_limit: 0

- type: log

  enabled: true

  paths:
    - /#/*/#/logs/*.log

  document_type: service

  #exclude_lines: ['^DBG']
  #include_lines: ['^ERR', '^WARN']
  #exclude_files: ['.gz$']

  multiline.pattern: '^\['
  multiline.negate: false
  multiline.match: after
  multiline.max_lines: 10000

  ignore_older: 3m
  close_timeout: 2m
  clean_inactive: 2h
  scan_frequency: 1s

  close_renamed: true
  close_removed: true
  clean_removed: true
  fields_under_root: true
  harvester_limit: 0

tags: ["#####"]
#==================== Elasticsearch template setting ==========================

setup.template:
  name: 'deploy'
  pattern: '*'
  fields: "/etc/filebeat/fields.yml"
  overwrite: true
  enabled: true

setup.template.settings:
  index.number_of_shards: 1
  #index.codec: best_compression
  #_source.enabled: false

#============================== Dashboards =====================================
setup.dashboards.enabled: true
setup.dashboards.index: "10288-*"
setup.dashboards.url:

#============================== Kibana =====================================

setup.kibana:
  host: "deploy-elk1.mhd.local:5601"

#-------------------------- Elasticsearch output ------------------------------

output.elasticsearch:
  hosts: ["###:9200"]
  timeout: 600
#  protocol: "https"
#  username: "elastic"
#  password: "#####"
  max_retries: 0
  backoff.init: 600s
  backoff.max: 600s
  index: "10288-%{+yyyy.MM.dd-HH.mm}"

#================================ Logging =====================================

logging.level: info
#logging.selectors: ["*"]

Errors in log:

2018-10-10T06:34:17.781-0400	ERROR	registrar/registrar.go:346	Writing of registry returned error: rename /var/lib/filebeat/registry.new /var/lib/filebeat/registry: no such file or directory. Continuing...
2018-10-10T06:34:17.939-0400	ERROR	registrar/registrar.go:346	Writing of registry returned error: rename /var/lib/filebeat/registry.new /var/lib/filebeat/registry: no such file or directory. Continuing...
2018-10-10T06:34:33.712-0400	ERROR	registrar/registrar.go:346	Writing of registry returned error: rename /var/lib/filebeat/registry.new /var/lib/filebeat/registry: no such file or directory. Continuing...
2018-10-10T06:34:52.725-0400	ERROR	registrar/registrar.go:346	Writing of registry returned error: rename /var/lib/filebeat/registry.new /var/lib/filebeat/registry: no such file or directory. Continuing...
2018-10-10T06:35:01.709-0400	ERROR	registrar/registrar.go:346	Writing of registry returned error: rename /var/lib/filebeat/registry.new /var/lib/filebeat/registry: no such file or directory. Continuing...
2018-10-10T06:35:02.886-0400	ERROR	registrar/registrar.go:346	Writing of registry returned error: rename /var/lib/filebeat/registry.new /var/lib/filebeat/registry: no such file or directory. Continuing...
2018-10-10T06:35:28.325-0400	ERROR	registrar/registrar.go:346	Writing of registry returned error: rename /var/lib/filebeat/registry.new /var/lib/filebeat/registry: no such file or directory. Continuing...

But I have 2 events by every line in log files. What can be wrong?
Filebeat collecting information about services starts, only first 2 minutes and killing after by script.

Is filebeat restart and you get the same log files indexed after restart?

Filebeat keeps track of file offsets in the registry file. If the registry file can not be written, then you loose state between Filebeat restarts.

  1. Does the directory /var/lib/filebeat exist?
  2. Is there already a file named /var/lib/filebeat/registry, but with wrong user permissions?
  3. Is the /var/lib directory a local disk or some remote storage?

I get same logs in one 2-minutes session of filebeat.
Registry file isn't important. It's recreate before every start filebeat by script. Tail in filebeat config not working as I need, it's start too late and not collect first lines. I not need full logs from servers, only logs when services started. Perhaps with the loss of a small part of the logs.

Events example (elastice received time \n, event):

Summary

October 15th 2018, 11:37:36.954
2018-10-15 05:37:29,658 +0000 [localhost-startStop-1] [org.apache.catalina.core.ContainerBase.[Catalina].[localhost].[/]] INFO Initializing Spring FrameworkServlet 'status'

October 15th 2018, 11:37:36.954
2018-10-15 05:37:29,658 +0000 [localhost-startStop-1] [org.apache.catalina.core.ContainerBase.[Catalina].[localhost].[/]] INFO Initializing Spring FrameworkServlet 'status'

October 15th 2018, 11:37:36.954
2018-10-15 05:37:30,646 +0000 [localhost-startStop-1] [org.apache.catalina.core.ContainerBase.[Catalina].[localhost].[/]] INFO Initializing Spring FrameworkServlet 'rest'

October 15th 2018, 11:37:36.954
2018-10-15 05:37:30,646 +0000 [localhost-startStop-1] [org.apache.catalina.core.ContainerBase.[Catalina].[localhost].[/]] INFO Initializing Spring FrameworkServlet 'rest'

October 15th 2018, 11:37:36.954
2018-10-15 05:37:31,172 +0000 [localhost-startStop-1] [org.apache.catalina.core.ContainerBase.[Catalina].[localhost].[/]] INFO Initializing Spring FrameworkServlet 'jsonrpc'

October 15th 2018, 11:37:36.954
2018-10-15 05:37:31,172 +0000 [localhost-startStop-1] [org.apache.catalina.core.ContainerBase.[Catalina].[localhost].[/]] INFO Initializing Spring FrameworkServlet 'jsonrpc'

October 15th 2018, 11:37:36.954
2018-10-15 05:37:31,238 +0000 [localhost-startStop-1] [org.apache.catalina.core.ContainerBase.[Catalina].[localhost].[/]] INFO Initializing Spring FrameworkServlet 'remoting'

October 15th 2018, 11:37:36.954
2018-10-15 05:37:31,238 +0000 [localhost-startStop-1] [org.apache.catalina.core.ContainerBase.[Catalina].[localhost].[/]] INFO Initializing Spring FrameworkServlet 'remoting'

October 15th 2018, 11:37:36.954
2018-10-15 05:37:31,445 +0000 [localhost-startStop-1] [org.apache.catalina.startup.HostConfig] INFO Deployment of web application archive /*.war has finished in 24,323 ms

October 15th 2018, 11:37:32.023
2018-10-15 05:37:31,172 +0000 [localhost-startStop-1] [org.apache.catalina.core.ContainerBase.[Catalina].[localhost].[/]] INFO Initializing Spring FrameworkServlet 'jsonrpc'

October 15th 2018, 11:37:32.023
2018-10-15 05:37:31,172 +0000 [localhost-startStop-1] [org.apache.catalina.core.ContainerBase.[Catalina].[localhost].[/]] INFO Initializing Spring FrameworkServlet 'jsonrpc'

October 15th 2018, 11:37:32.023
2018-10-15 05:37:31,238 +0000 [localhost-startStop-1] [org.apache.catalina.core.ContainerBase.[Catalina].[localhost].[/]] INFO Initializing Spring FrameworkServlet 'remoting'

October 15th 2018, 11:37:32.023
2018-10-15 05:37:31,238 +0000 [localhost-startStop-1] [org.apache.catalina.core.ContainerBase.[Catalina].[localhost].[/]] INFO Initializing Spring FrameworkServlet 'remoting'

October 15th 2018, 11:37:32.023
2018-10-15 05:37:31,445 +0000 [localhost-startStop-1] [org.apache.catalina.startup.HostConfig] INFO Deployment of web application archive /*.war has finished in 24,323 ms

October 15th 2018, 11:37:31.023
2018-10-15 05:37:30,646 +0000 [localhost-startStop-1] [org.apache.catalina.core.ContainerBase.[Catalina].[localhost].[/]] INFO Initializing Spring FrameworkServlet 'rest'

October 15th 2018, 11:37:31.023
2018-10-15 05:37:30,646 +0000 [localhost-startStop-1] [org.apache.catalina.core.ContainerBase.[Catalina].[localhost].[/]] INFO Initializing Spring FrameworkServlet 'rest'

October 15th 2018, 11:37:30.022
2018-10-15 05:37:29,658 +0000 [localhost-startStop-1] [org.apache.catalina.core.ContainerBase.[Catalina].[localhost].[/]] INFO Initializing Spring FrameworkServlet 'status'

October 15th 2018, 11:37:30.022
2018-10-15 05:37:29,658 +0000 [localhost-startStop-1] [org.apache.catalina.core.ContainerBase.[Catalina].[localhost].[/]] INFO Initializing Spring FrameworkServlet 'status'

October 15th 2018, 11:37:23.021
2018-10-15 05:37:07,122 +0000 [localhost-startStop-1] [org.apache.catalina.startup.HostConfig] INFO Deploying web application archive /*.war

October 15th 2018, 11:37:23.021
2018-10-15 05:37:19,326 +0000 [localhost-startStop-1] [org.apache.catalina.core.ContainerBase.[Catalina].[localhost].[/]] INFO No Spring WebApplicationInitializer types detected on classpath

October 15th 2018, 11:37:23.021
2018-10-15 05:37:19,384 +0000 [localhost-startStop-1] [org.apache.catalina.core.ContainerBase.[Catalina].[localhost].[/]] INFO Initializing Logback from [file:/*/../../config/logback.xml]

October 15th 2018, 11:37:23.021
2018-10-15 05:37:19,466 +0000 [localhost-startStop-1] [org.apache.catalina.core.ContainerBase.[Catalina].[localhost].[/]] INFO Removing all previous handlers for JUL to SLF4J bridge

October 15th 2018, 11:37:23.021
2018-10-15 05:37:19,466 +0000 [localhost-startStop-1] [org.apache.catalina.core.ContainerBase.[Catalina].[localhost].[/]] INFO Installing JUL to SLF4J bridge

October 15th 2018, 11:37:23.021
2018-10-15 05:37:19,489 +0000 [localhost-startStop-1] [org.apache.catalina.core.ContainerBase.[Catalina].[localhost].[/]] INFO Initializing Spring root WebApplicationContext

October 15th 2018, 11:37:23.021
2018-10-15 05:37:19,489 +0000 [localhost-startStop-1] [org.apache.catalina.core.ContainerBase.[Catalina].[localhost].[/]] INFO Initializing Spring root WebApplicationContext

October 15th 2018, 11:37:21.950
2018-10-15 05:37:19,466 +0000 [localhost-startStop-1] [org.apache.catalina.core.ContainerBase.[Catalina].[localhost].[/]] INFO Installing JUL to SLF4J bridge

October 15th 2018, 11:37:21.950
2018-10-15 05:37:19,489 +0000 [localhost-startStop-1] [org.apache.catalina.core.ContainerBase.[Catalina].[localhost].[/]] INFO Initializing Spring root WebApplicationContext

October 15th 2018, 11:37:21.950
2018-10-15 05:37:19,489 +0000 [localhost-startStop-1] [org.apache.catalina.core.ContainerBase.[Catalina].[localhost].[/]] INFO Initializing Spring root WebApplicationContext

October 15th 2018, 11:37:21.850
2018-10-15 05:37:07,122 +0000 [localhost-startStop-1] [org.apache.catalina.startup.HostConfig] INFO Deploying web application archive /*.war

October 15th 2018, 11:37:21.850
2018-10-15 05:37:19,326 +0000 [localhost-startStop-1] [org.apache.catalina.core.ContainerBase.[Catalina].[localhost].[/]] INFO No Spring WebApplicationInitializer types detected on classpath

October 15th 2018, 11:37:21.850
2018-10-15 05:37:19,384 +0000 [localhost-startStop-1] [org.apache.catalina.core.ContainerBase.[Catalina].[localhost].[/]] INFO Initializing Logback from [file:/*/../../config/logback.xml]

October 15th 2018, 11:37:21.850
2018-10-15 05:37:19,466 +0000 [localhost-startStop-1] [org.apache.catalina.core.ContainerBase.[Catalina].[localhost].[/]] INFO Removing all previous handlers for JUL to SLF4J bridge

Why is the registry file not important? The registry file is used by filebeat to keep state between restart. Otherwise one can get duplicates.

Questions:

  1. Is filebeat restart and you get the same log files indexed after restart?
  2. Is the /var/lib directory a local disk or some remote storage?
  3. Any output related errors in the filebeat logs?
  1. No, by timestamp in arrived logs - it's sending at same time.
  2. On local disk.
  3. Only on registar file.

Filebeat uses send-at-least once semantics. That is, if there is an output error, it sends the same event again. This can lead to duplicates at time. Another source for duplicates is if 2 filebeat inputs are configured to read the same file.

The fact that /var/lib/filebeat is local, but you get write errors is pretty weird. Can you share the output of ls -la /var/lib/filebeat? Which user do you run filebeat with?

Can you enabled debug logs and check logs from the elasticsearch output?

Have you checked Elasticsearch logs for any IO errors?

It's tcpdump from filebeat host:
tcpdump -A -s 1500 -n -i eth0 dst host elasticsearch.local | grep finished

Summary

{"@timestamp":"2018-10-22T04:52:56.840Z","tags":["api.local"],"input":{"type":"log"},"prospector":{"type":"log"},"beat":{"name":"api","hostname":"api","version":"6.3.2"},"host":{"name":"api"},"source":"/home/api/api/tomcat/api/logs/container.log","offset":30313,"message":"2018-10-22 04:52:52,916 +0000 [localhost-startStop-1] [org.apache.catalina.startup.HostConfig] INFO Deployment of web application archive /home/api/api/tomcat/api/webapps/api.war has finished in 25,251 ms"}
{"@timestamp":"2018-10-22T04:52:57.898Z","host":{"name":"api"},"beat":{"name":"api","hostname":"api","version":"6.3.2"},"source":"/home/api/api/tomcat/api/logs/container.log","offset":30313,"message":"2018-10-22 04:52:52,916 +0000 [localhost-startStop-1] [org.apache.catalina.startup.HostConfig] INFO Deployment of web application archive /home/api/api/tomcat/api/webapps/api.war has finished in 25,251 ms","tags":["api.local"],"prospector":{"type":"log"},"input":{"type":"log"}}

in filebeat log with debug I have only 1 message (checked with duplicates in tcpdump):

Summary

"message": "2018-10-22 05:19:13,514 +0000 [localhost-startStop-1] [org.apache.catalina.startup.HostConfig] INFO Deployment of web application archive /home/api/api/tomcat/api/webapps/api.war has finished in 32,606 ms",

on elastic host:
tcpdump -A -s 1500 -n -i eth0 host api.local | grep finished

Summary

{"@timestamp":"2018-10-22T04:52:56.840Z","tags":["api.local"],"input":{"type":"log"},"prospector":{"type":"log"},"beat":{"name":"api","hostname":"api","version":"6.3.2"},"host":{"name":"api"},"source":"/home/api/api/tomcat/api/logs/container.log","offset":30313,"message":"2018-10-22 04:52:52,916 +0000 [localhost-startStop-1] [org.apache.catalina.startup.HostConfig] INFO Deployment of web application archive /home/api/api/tomcat/api/webapps/api.war has finished in 25,251 ms"}
{"@timestamp":"2018-10-22T04:52:57.898Z","host":{"name":"api"},"beat":{"name":"api","hostname":"api","version":"6.3.2"},"source":"/home/api/api/tomcat/api/logs/container.log","offset":30313,"message":"2018-10-22 04:52:52,916 +0000 [localhost-startStop-1] [org.apache.catalina.startup.HostConfig] INFO Deployment of web application archive /home/api/api/tomcat/api/webapps/api.war has finished in 25,251 ms","tags":["api.local"],"prospector":{"type":"log"},"input":{"type":"log"}}

ls -la /var/lib/filebeat:

Summary

ls -al
total 20
drwxrwx--x 2 api api 4096 Oct 22 00:54 .
drwxr-xr-x 48 root root 4096 Aug 8 04:04 ..
-rw-rw---- 1 api api 48 Aug 8 04:04 meta.json
-rw------- 1 api api 4311 Oct 22 00:54 registry

Logs:
Elastic:

Summary

[2018-10-22T01:18:55,466][INFO ][o.e.c.m.MetaDataIndexTemplateService] [deploy-elk1] adding template [kibana_index_template:.kibana] for index patterns [.kibana]
[2018-10-22T01:18:56,039][INFO ][o.e.c.m.MetaDataIndexTemplateService] [deploy-elk1] adding template [kibana_index_template:.kibana] for index patterns [.kibana]
[2018-10-22T01:19:00,377][INFO ][o.e.c.m.MetaDataIndexTemplateService] [deploy-elk1] adding template [deploy] for index patterns []
[2018-10-22T01:19:00,401][INFO ][o.e.c.m.MetaDataCreateIndexService] [deploy-elk1] [10288-2018.10.22-01.18] creating index, cause [auto(bulk api)], templates [deploy], shards [1]/[1], mappings [doc]
[2018-10-22T01:19:00,464][INFO ][o.e.c.m.MetaDataMappingService] [deploy-elk1] [10288-2018.10.22-01.18/fVhnfv2yT62BAE8tc_IaRg] update_mapping [doc]
[2018-10-22T01:19:13,335][INFO ][o.e.c.m.MetaDataCreateIndexService] [deploy-elk1] [10288-2018.10.22-01.19] creating index, cause [auto(bulk api)], templates [deploy], shards [1]/[1], mappings [doc]
[2018-10-22T01:19:13,394][INFO ][o.e.c.m.MetaDataMappingService] [deploy-elk1] [10288-2018.10.22-01.19/4hKQR7DwQJ2uyWHBe_9fsw] update_mapping [doc]
[2018-10-22T01:19:14,129][INFO ][o.e.c.m.MetaDataIndexTemplateService] [deploy-elk1] adding template [deploy] for index patterns [
]
[2018-10-22T01:20:01,907][INFO ][o.e.c.m.MetaDataCreateIndexService] [deploy-elk1] [10288-2018.10.22-01.20] creating index, cause [auto(bulk api)], templates [deploy], shards [1]/[1], mappings [doc]
[2018-10-22T01:20:01,985][INFO ][o.e.c.m.MetaDataMappingService] [deploy-elk1] [10288-2018.10.22-01.20/CuzYkkS0TtqmFcUt3VXVTA] update_mapping [doc]

Summary

2018-10-22T01:18:57.259-0400 DEBUG [registrar] registrar/registrar.go:97 Registry file set to: /var/lib/filebeat/registry
2018-10-22T01:18:57.259-0400 INFO registrar/registrar.go:117 Loading registrar data from /var/lib/filebeat/registry
2018-10-22T01:18:57.260-0400 INFO registrar/registrar.go:124 States Loaded from registrar: 19
2018-10-22T01:18:57.260-0400 INFO crawler/crawler.go:48 Loading Inputs: 2
2018-10-22T01:18:57.260-0400 DEBUG [registrar] registrar/registrar.go:250 Starting Registrar
2018-10-22T01:18:57.260-0400 DEBUG [processors] processors/processor.go:49 Processors:
2018-10-22T01:18:57.260-0400 DEBUG [input] log/config.go:178 recursive glob enabled
2018-10-22T01:18:57.260-0400 DEBUG [input] log/input.go:127 exclude_files: []. Number of stats: 19
2018-10-22T01:18:57.261-0400 DEBUG [input] file/states.go:51 New state added for /home/api/api/tomcat/api/logs/container.log
2018-10-22T01:18:57.261-0400 DEBUG [acker] beater/acker.go:47 stateful ack {"count": 1}
2018-10-22T01:18:58.844-0400 INFO [monitoring] log/log.go:124 Non-zero metrics in the last 30s {"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":20,"time"
2018-10-22T01:18:58.922-0400 ERROR registrar/registrar.go:346 Writing of registry returned error: rename /var/lib/filebeat/registry.new /var/lib/filebeat/registry: no such
2018-10-22T01:18:59.133-0400 DEBUG [registrar] registrar/registrar.go:328 Processing 1 events
2018-10-22T01:18:59.133-0400 DEBUG [registrar] registrar/registrar.go:298 Registrar state updates processed. Count: 1
2018-10-22T01:18:59.133-0400 DEBUG [registrar] registrar/registrar.go:318 Registrar states cleaned up. Before: 19, After: 19, Pending: 3
2018-10-22T01:18:59.133-0400 DEBUG [registrar] registrar/registrar.go:383 Write registry file: /var/lib/filebeat/registry
2018-10-22T01:18:59.133-0400 DEBUG [input] log/input.go:148 input with previous states loaded: 6
2018-10-22T01:18:59.133-0400 INFO log/input.go:118 Configured paths: [/home//api//api/logs/.log /home//api/*/api/logs/catalina.out]
2018-10-22T01:18:59.133-0400 INFO input/input.go:88 Starting input of type: log; ID: 2452345
2018-10-22T01:18:59.133-0400 DEBUG [acker] beater/acker.go:47 stateful ack {"count": 1}
2018-10-22T01:18:59.133-0400 DEBUG [input] log/input.go:154 Start next scan
2018-10-22T01:18:59.133-0400 DEBUG [processors] processors/processor.go:49 Processors:
2018-10-22T01:18:59.133-0400 DEBUG [input] log/config.go:178 recursive glob enabled

2018-10-22T01:19:13.043-0400 DEBUG [input] log/input.go:384 Check file for harvesting: /home/api/api/tomcat/api/logs/container.log
2018-10-22T01:19:13.043-0400 DEBUG [input] log/input.go:470 Update existing file for harvesting: /home/api/api/tomcat/api/logs/container.log, offset: 36648
2018-10-22T01:19:13.043-0400 DEBUG [input] log/input.go:479 Resuming harvesting of file: /home/api/api/tomcat/api/logs/container.log, offset: 36648, new si
2018-10-22T01:19:13.044-0400 DEBUG [harvester] log/harvester.go:458 Set previous offset for file: /home/api/api/tomcat/api/logs/container.log. Offset: 3664
2018-10-22T01:19:13.044-0400 DEBUG [harvester] log/harvester.go:449 Setting offset for file: /home/api/api/tomcat/api/logs/container.log. Offset: 36648

2018-10-22T01:20:26.935-0400 DEBUG [elasticsearch] elasticsearch/client.go:303 PublishEvents: 3 events have been published to elasticsearch in 8.15451ms.
2018-10-22T01:20:26.935-0400 DEBUG [memqueue] memqueue/ackloop.go:143 ackloop: receive ack [74: 0, 3]
2018-10-22T01:20:26.935-0400 DEBUG [memqueue] memqueue/eventloop.go:518 broker ACK events: count=3, start-seq=1385, end-seq=1387

2018-10-22T01:20:26.935-0400 DEBUG [memqueue] memqueue/ackloop.go:111 ackloop: return ack to broker loop:3
2018-10-22T01:20:26.935-0400 DEBUG [memqueue] memqueue/ackloop.go:114 ackloop: done send ack
2018-10-22T01:20:26.935-0400 DEBUG [acker] beater/acker.go:47 stateful ack {"count": 3}
2018-10-22T01:20:26.935-0400 DEBUG [registrar] registrar/registrar.go:328 Processing 3 events
2018-10-22T01:20:26.935-0400 DEBUG [registrar] registrar/registrar.go:298 Registrar state updates processed. Count: 3
2018-10-22T01:20:26.935-0400 DEBUG [registrar] registrar/registrar.go:318 Registrar states cleaned up. Before: 20, After: 20, Pending: 20
2018-10-22T01:20:26.936-0400 DEBUG [registrar] registrar/registrar.go:383 Write registry file: /var/lib/filebeat/registry
2018-10-22T01:20:26.936-0400 DEBUG [input] input/input.go:125 Run input
2018-10-22T01:20:26.936-0400 DEBUG [input] log/input.go:154 Start next scan

2018-10-22T01:20:26.955-0400 ERROR registrar/registrar.go:346 Writing of registry returned error: rename /var/lib/filebeat/registry.new /var/lib/filebeat/registry: no such file or directory. Continuing...
2018-10-22T01:20:26.970-0400 DEBUG [input] input/input.go:125 Run input
2018-10-22T01:20:26.971-0400 DEBUG [input] log/input.go:154 Start next scan

2018-10-22T01:20:26.972-0400 DEBUG [input] log/input.go:470 Update existing file for harvesting: /home/api/api/tomcat/api/logs/container.log, offset: 39702
2018-10-22T01:20:26.972-0400 DEBUG [input] log/input.go:522 Harvester for file is still running: /home/api/api/tomcat/api/logs/container.log
2018-10-22T01:20:26.972-0400 DEBUG [input] log/input.go:175 input states cleaned up. Before: 7, After: 7, Pending: 7
--- --- --- --- ---
2018-10-22T01:20:27.927-0400 DEBUG [publish] pipeline/processor.go:291 Publish event: {

}
2018-10-22T01:20:27.938-0400 DEBUG [input] input/input.go:125 Run input
2018-10-22T01:20:27.938-0400 DEBUG [input] log/input.go:154 Start next scan
2018-10-22T01:20:28.123-0400 DEBUG [service] service/service.go:34 Received sigterm/sigint, stopping
2018-10-22T01:20:28.123-0400 INFO beater/filebeat.go:420 Stopping filebeat
2018-10-22T01:20:28.123-0400 INFO crawler/crawler.go:109 Stopping Crawler
2018-10-22T01:20:28.123-0400 INFO crawler/crawler.go:119 Stopping 2 inputs
2018-10-22T01:20:28.123-0400 INFO input/input.go:122 input ticker stopped
2018-10-22T01:20:28.123-0400 INFO input/input.go:139 Stopping Input: 14018790516420757715
2018-10-22T01:20:28.123-0400 DEBUG [publish] pipeline/client.go:131 client: closing acker
2018-10-22T01:20:28.123-0400 DEBUG [publish] pipeline/client.go:133 client: done closing acker
2018-10-22T01:20:28.123-0400 INFO input/input.go:122 input ticker stopped
2018-10-22T01:20:28.123-0400 INFO input/input.go:139 Stopping Input: 4425958835838708591
2018-10-22T01:20:28.123-0400 INFO harvester/forwarder.go:35 Input outlet closed

Please format logs using the </> button in the editor window.

Checking logs I found this:

2018-10-22T01:19:13.043-0400 DEBUG [input] log/input.go:479 Resuming harvesting of file: /home/api/api/tomcat/api/logs/container.log, offset: 36648, new si
2018-10-22T01:19:13.044-0400 DEBUG [harvester] log/harvester.go:458 Set previous offset for file: /home/api/api/tomcat/api/logs/container.log. Offset: 3664

The file offset is reset to an older location and then to another one. To me this looks like your 2 configured inputs do read the same file. Any 2 inputs must not read the same file in filebeat.

I don't know how it collecting 2 messages. I'm try to off and delete second log section in settings. It's not working, filebeat make duplicates, config:

- type: log

  enabled: true

  paths:
- /home/api/api/tomcat/api/logs/container.log
- /home/api/api/tomcat/api/logs/catalina.out

  document_type: container

  multiline.pattern: '^\['
  multiline.negate: false
  multiline.match: after
  multiline.max_lines: 100

  ignore_older: 3m
  close_timeout: 2m
  clean_inactive: 2h
  scan_frequency: 1s

  close_renamed: true
  close_removed: true
  clean_removed: true
  fields_under_root: true
  harvester_limit: 0

- type: log

  enabled: false

  paths:
- /home/*/tapi/logs/*.log

  document_type: service

And I get:

[api@api1:filebeat]$ grep -i harvest filebeat | grep -i previous
2018-10-28T04:13:11.079-0400	DEBUG	[harvester]	log/harvester.go:458	Set previous offset for file: /home/api/api/tomcat/api/logs/container.log. Offset: 9162 
[api@api1:filebeat]$ grep -i harvest filebeat | grep -i Resuming
2018-10-28T04:13:11.078-0400	DEBUG	[input]	log/input.go:479	Resuming harvesting of file: /home/api/api/tomcat/api/logs/container.log, offset: 9162, new size: 10523

What I can to do with this file?

Hi, there is full log and screenshot of this problem:
http://rgho.st/7JVBl9B97


On last must be only 16 messages, not 32.

Config:

Summary
filebeat.inputs:

- type: log

  enabled: true

  paths:
#    - /home/*/api/*/api/logs/*.log
    - /home/api/api/tomcat/api/logs/container.log
#    - /home/api/api/tomcat/api/logs/catalina.out

  document_type: container

  #exclude_lines: ['^DBG']
  #include_lines: ['^ERR', '^WARN']
  #exclude_files: ['.gz$']

  multiline.pattern: '^\['
  multiline.negate: false
  multiline.match: after
  multiline.max_lines: 10000

  ignore_older: 3m
  close_timeout: 2m
  clean_inactive: 2h
  scan_frequency: 1s

  close_renamed: true
  close_removed: true
  clean_removed: true
  fields_under_root: true
  harvester_limit: 0

- type: log

  enabled: false

  paths:
    - /home/*/api/logs/*.log

  document_type: service

  #exclude_lines: ['^DBG']
  #include_lines: ['^ERR', '^WARN']
  #exclude_files: ['.gz$']

  multiline.pattern: '^\['
  multiline.negate: false
  multiline.match: after
  multiline.max_lines: 10000

  ignore_older: 3m
  close_timeout: 2m
  clean_inactive: 2h
  scan_frequency: 1s

  close_renamed: true
  close_removed: true
  clean_removed: true
  fields_under_root: true
  harvester_limit: 0

tags: ["api1-repl.mhd.local"]

#==================== Elasticsearch template setting ==========================

setup.template:
  name: 'deploy'
  pattern: '*'
  fields: "/etc/filebeat/fields.yml"
  overwrite: true
  enabled: true

setup.template.settings:
  index.number_of_shards: 1
  #index.codec: best_compression
  #_source.enabled: false

#============================== Dashboards =====================================
setup.dashboards.enabled: true
setup.dashboards.index: "10288-*"
setup.dashboards.url:

#============================== Kibana =====================================

setup.kibana:
  host: "deploy-elk1.mhd.local:5601"

#-------------------------- Elasticsearch output ------------------------------

output.elasticsearch:
  hosts: ["deploy-elk1.mhd.local:9200"]
  timeout: 600
  max_retries: 0
  backoff.init: 600s
  backoff.max: 600s
  index: "10288-%{+yyyy.MM.dd-HH.mm}"

#================================ Logging =====================================

logging.level: debug
#logging.selectors: ["*"]

filebeat.shutdown_timeout: 0

fields.yml:

Summary

`- key: beat
title: Beat
description: >
Contains common beat fields available in all event types.
fields:

- name: beat.name
  description: >
    The name of the Beat sending the log messages. If the Beat name is
    set in the configuration file, then that value is used. If it is not
    set, the hostname is used. To set the Beat name, use the `name`
    option in the configuration file.
- name: beat.hostname
  description: >
    The hostname as returned by the operating system on which the Beat is
    running.
- name: beat.timezone
  description: >
    The timezone as returned by the operating system on which the Beat is
    running.

- name: "@timestamp"
  type: date
  required: true
  format: date
  example: August 26th 2016, 12:35:53.332
  description: >
    The timestamp when the event log record was generated.

- name: tags
  description: >
    Arbitrary tags that can be set per Beat and per transaction
    type.

- name: fields
  type: object
  object_type: keyword
  description: >
    Contains user configurable fields.

- name: error
  type: group
  description: >
    Error fields containing additional info in case of errors.
  fields:
    - name: message
      type: text
      description: >
        Error message.
    - name: code
      type: long
      description: >
        Error code.
    - name: type
      type: keyword
      description: >
        Error type.

key: log
title: Log file content
description: >
Contains log file lines.
fields:
- name: source
type: keyword
required: true
description: >
The file from which the line was read. This field contains the absolute path to the file.
For example: /var/log/system.log.

- name: message
  type: text
  ignore_above: 0
  required: true
  description: >
    The content of the line read from the log file.

- name: input.type
  required: true
  description: >
    The input type from which the event was generated. This field is set to the value specified
    for the `type` option in the input section of the Filebeat config file.

- name: fileset.module
  description: >
    The Filebeat module that generated this event.

- name: fileset.name
  description: >
    The Filebeat fileset that generated this event.

- name: syslog.facility
  type: long
  required: false
  description: >
    The facility extracted from the priority.

- name: syslog.priority
  type: long
  required: false
  description: >
    The priority of the syslog event.

- name: syslog.severity_label
  type: keyword
  required: false
  description: >
    The human readable severity.

- name: syslog.facility_label
  type: keyword
  required: false
  description: >
    The human readable facility.

- name: process.program
  type: keyword
  required: false
  description: >
    The name of the program.

- name: process.pid
  type: long
  required: false
  description: >
    The pid of the process.

`

Interestingly there is no error log, but from metrics log I can tell filebeat retries to publish events. Any errors in Elasticsearch?

Why do you have a new index being created every minute? This will create a huge number of indices and shards, which will be very inefficient. Once you reach a reasonably large number of indices, updates to the cluster state can get slow and cause performance problems. If you have a large number of indices in your cluster, this could very well be part of the reason why Filebeat has to retry. I would recommend dramatically reducing the number of indices in the cluster and change this to generate daily indices instead to see what effect that has.

I collect data only for 2 minutes after the application restart after deploy. And I delete them within a week. I tried to delete -HH-mm from index names now. And it's not helped.

How many indices and shards do you have in the cluster at the moment?

Log from elasticsearch:
http://rgho.st/7ZdrCCVZ9

On test environment - from 10 servers I have 48 indexes by last week.
On production it count will be 2800 maximum per week. It's in awful situations, I think not more than 1500. If I set index by days - the figure will be halved or divided on 2.1. Services restarting it's a rare occurrence.

Can you please provide the full output of the cluster stats API? This would provide us with a good amount of information about the current state of the cluster.

Summary

{
"_nodes" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"cluster_name" : "deploy-elk",
"timestamp" : 1541499924420,
"status" : "yellow",
"indices" : {
"count" : 48,
"shards" : {
"total" : 48,
"primaries" : 48,
"replication" : 0.0,
"index" : {
"shards" : {
"min" : 1,
"max" : 1,
"avg" : 1.0
},
"primaries" : {
"min" : 1,
"max" : 1,
"avg" : 1.0
},
"replication" : {
"min" : 0.0,
"max" : 0.0,
"avg" : 0.0
}
}
},
"docs" : {
"count" : 357520,
"deleted" : 107
},
"store" : {
"size" : "54.3mb",
"size_in_bytes" : 57018998
},
"fielddata" : {
"memory_size" : "1kb",
"memory_size_in_bytes" : 1024,
"evictions" : 0
},
"query_cache" : {
"memory_size" : "0b",
"memory_size_in_bytes" : 0,
"total_count" : 0,
"hit_count" : 0,
"miss_count" : 0,
"cache_size" : 0,
"cache_count" : 0,
"evictions" : 0
},
"completion" : {
"size" : "0b",
"size_in_bytes" : 0
},
"segments" : {
"count" : 201,
"memory" : "798.4kb",
"memory_in_bytes" : 817571,
"terms_memory" : "596.1kb",
"terms_memory_in_bytes" : 610443,
"stored_fields_memory" : "94.5kb",
"stored_fields_memory_in_bytes" : 96840,
"term_vectors_memory" : "0b",
"term_vectors_memory_in_bytes" : 0,
"norms_memory" : "8.1kb",
"norms_memory_in_bytes" : 8320,
"points_memory" : "6.6kb",
"points_memory_in_bytes" : 6860,
"doc_values_memory" : "92.8kb",
"doc_values_memory_in_bytes" : 95108,
"index_writer_memory" : "0b",
"index_writer_memory_in_bytes" : 0,
"version_map_memory" : "0b",
"version_map_memory_in_bytes" : 0,
"fixed_bit_set" : "0b",
"fixed_bit_set_memory_in_bytes" : 0,
"max_unsafe_auto_id_timestamp" : 1541482815718,
"file_sizes" : { }
}
},
"nodes" : {
"count" : {
"total" : 1,
"data" : 1,
"coordinating_only" : 0,
"master" : 1,
"ingest" : 1
},
"versions" : [
"6.3.2"
],
"os" : {
"available_processors" : 2,
"allocated_processors" : 2,
"names" : [
{
"name" : "Linux",
"count" : 1
}
],
"mem" : {
"total" : "7.7gb",
"total_in_bytes" : 8372527104,
"free" : "313.5mb",
"free_in_bytes" : 328806400,
"used" : "7.4gb",
"used_in_bytes" : 8043720704,
"free_percent" : 4,
"used_percent" : 96
}
},
"process" : {
"cpu" : {
"percent" : 3
},
"open_file_descriptors" : {
"min" : 493,
"max" : 493,
"avg" : 493
}
},
"jvm" : {
"max_uptime" : "2.5h",
"max_uptime_in_millis" : 9075707,
"versions" : [
{
"version" : "1.8.0_121-api_p3",
"vm_name" : "OpenJDK 64-Bit Server VM",
"vm_version" : "25.121-b13",
"vm_vendor" : "Oracle Corporation",
"count" : 1
}
],
"mem" : {
"heap_used" : "171.4mb",
"heap_used_in_bytes" : 179759000,
"heap_max" : "2.9gb",
"heap_max_in_bytes" : 3203792896
},
"threads" : 48
},
"fs" : {
"total" : "19.5gb",
"total_in_bytes" : 20999553024,
"free" : "6.8gb",
"free_in_bytes" : 7363575808,
"available" : "6gb",
"available_in_bytes" : 6468833280
},
"plugins" : ,
"network_types" : {
"transport_types" : {
"security4" : 1
},
"http_types" : {
"security4" : 1
}
}
}
}