Filebeat stops sending logs with no reason

I have an unusual issue
I have server sending logs to elasticsearch using filebeat
Everything goes ok until suddenly filebeat stops sending logs. But if I curl to elasticsearch it responds.
tcpdump shows that filebeat doesn't even try to send logs.

If I restart elasticsearch everything works fine for a while.
Sometimes in addition I need to restart filebeat to start sending logs again

Noting helpful in elastic trace

I have output from filebeat like below:

2017-09-19T18:05:16+03:00 DBG  Run prospector
2017-09-19T18:05:16+03:00 DBG  Start next scan
2017-09-19T18:05:16+03:00 DBG  Check file for harvesting: /var/log/nginx/access.log
2017-09-19T18:05:16+03:00 DBG  Update existing file for harvesting: /var/log/nginx/access.log, offset: 103391024
2017-09-19T18:05:16+03:00 DBG  Check file for harvesting: /srv/current/var/logs/prod.log
2017-09-19T18:05:16+03:00 DBG  Harvester for file is still running: /var/log/nginx/access.log
2017-09-19T18:05:16+03:00 DBG  Update existing file for harvesting: /srv/current/var/logs/prod.log, offset: 701936
2017-09-19T18:05:16+03:00 DBG  Harvester for file is still running: /srv/current/var/logs/prod.log
2017-09-19T18:05:16+03:00 DBG  Prospector states cleaned up. Before: 26, After: 26
2017-09-19T18:05:16+03:00 DBG  Prospector states cleaned up. Before: 61, After: 61

My config is:

filebeat.prospectors:

- input_type: log
  document_type: nginx_access_log
  paths:
    - /var/log/nginx/access.log

- input_type: log
  document_type: symfony_log
  paths:
    - /srv/current/var/logs/prod.log

output.elasticsearch:
  hosts: ["elastic:9200"]
  index: "my_logs"
  max_retries: 1

Filebeat version: 5.4.1
Elasticsearch version: 5.5.1
I run elasticsearch in docker container docker.elastic.co/elasticsearch/elasticsearch:5.5.1
m.max_map_count=262144 is set
Approximately I send 20Mb/minute logs

Playing around with the settings didn't help
Any ideas where to dig?

Having the debug logs is the right way to go. Any chance you could share more of the debug log?

What is the rotation algorithm you are using? I assume the access logs are rotated and you want to finish read them. I would recommend you to specifiy something like access.log* to make sure also the rotated logs are read.

In the few log lines you provide above it states that 61 states are there but you only have 2 log lines active. So I assume the state is not fully cleaned up.

I'm not sure what is causing your issue yet but it's kind of strange that restarting ES sometimes helps and sometimes you need to restart FB.

Certainly I can produce more logs. From filebeat or elasticsearch?
Are you saying that Prospector states cleaned up. Before: 61, After: 61 means I have 61 file pointers opened but I have only 2 files? How can I completely clean them?

for /var/log/nginx/access.log I have rule in logrotate

/var/log/nginx/*.log {
 daily
 missingok
 rotate 14
 compress
 delaycompress
 notifempty
 create 0640 www-data adm
 sharedscripts
 prerotate
  if [ -d /etc/logrotate.d/httpd-prerotate ]; then \
   run-parts /etc/logrotate.d/httpd-prerotate; \
  fi \
 endscript
 postrotate
  invoke-rc.d nginx rotate >/dev/null 2>&1
 endscript
}

for /srv/current/var/logs/prod.log I don't have rotation set up yet I just do $ cat /dev/null > /srv/current/var/logs/prod.log daily.
In addition I would like to say that everything worked well until we moved to production and started to send logs massively. My colleague noticed that if we remove pipelines from elasticsearch ingest modes logs keep sending longer.

more logs from filebeat

2017-09-20T18:15:59+03:00 DBG  Run prospector
2017-09-20T18:15:59+03:00 DBG  Start next scan
2017-09-20T18:15:59+03:00 DBG  Check file for harvesting: /srv/current/var/logs/prod.log
2017-09-20T18:15:59+03:00 DBG  Update existing file for harvesting: /srv/current/var/logs/prod.log, offset: 75091
2017-09-20T18:15:59+03:00 DBG  Harvester for file is still running: /srv/current/var/logs/prod.log
2017-09-20T18:15:59+03:00 DBG  Prospector states cleaned up. Before: 63, After: 63
2017-09-20T18:16:03+03:00 ERR Failed to perform any bulk index operations: Post http://elastic:9200/_bulk: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2017-09-20T18:16:03+03:00 INFO Error publishing events (retrying): Post http://elastic:9200/_bulk: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2017-09-20T18:16:03+03:00 DBG  send fail
2017-09-20T18:16:09+03:00 DBG  Run prospector
2017-09-20T18:16:09+03:00 DBG  Start next scan
2017-09-20T18:16:09+03:00 DBG  Check file for harvesting: /var/log/nginx/access.log
2017-09-20T18:16:09+03:00 DBG  Update existing file for harvesting: /var/log/nginx/access.log, offset: 11950635
2017-09-20T18:16:09+03:00 DBG  Harvester for file is still running: /var/log/nginx/access.log
2017-09-20T18:16:09+03:00 DBG  Prospector states cleaned up. Before: 30, After: 30
2017-09-20T18:16:09+03:00 DBG  Run prospector
2017-09-20T18:16:09+03:00 DBG  Start next scan
2017-09-20T18:16:09+03:00 DBG  Check file for harvesting: /srv/current/var/logs/prod.log
2017-09-20T18:16:09+03:00 DBG  Update existing file for harvesting: /srv/current/var/logs/prod.log, offset: 75091
2017-09-20T18:16:09+03:00 DBG  Harvester for file is still running: /srv/current/var/logs/prod.log
2017-09-20T18:16:09+03:00 DBG  Prospector states cleaned up. Before: 63, After: 63
2017-09-20T18:16:11+03:00 DBG  ES Ping(url=http://elastic:9200, timeout=1m30s)
2017-09-20T18:16:11+03:00 DBG  Ping status code: 200
2017-09-20T18:16:11+03:00 INFO Connected to Elasticsearch version 5.5.1
2017-09-20T18:16:11+03:00 INFO Trying to load template for client: http://elastic:9200
2017-09-20T18:16:11+03:00 DBG  HEAD http://elastic:9200/_template/filebeat  
2017-09-20T18:16:11+03:00 INFO Template already exists and will not be overwritten.

It turned out that the problem is in elasticsearch. Will ask elasticsearch support. Thank you anyway)

This topic was automatically closed after 21 days. New replies are no longer allowed.