I have an unusual issue
I have server sending logs to elasticsearch using filebeat
Everything goes ok until suddenly filebeat stops sending logs. But if I curl to elasticsearch it responds.
tcpdump shows that filebeat doesn't even try to send logs.
If I restart elasticsearch everything works fine for a while.
Sometimes in addition I need to restart filebeat to start sending logs again
Noting helpful in elastic trace
I have output from filebeat like below:
2017-09-19T18:05:16+03:00 DBG Run prospector
2017-09-19T18:05:16+03:00 DBG Start next scan
2017-09-19T18:05:16+03:00 DBG Check file for harvesting: /var/log/nginx/access.log
2017-09-19T18:05:16+03:00 DBG Update existing file for harvesting: /var/log/nginx/access.log, offset: 103391024
2017-09-19T18:05:16+03:00 DBG Check file for harvesting: /srv/current/var/logs/prod.log
2017-09-19T18:05:16+03:00 DBG Harvester for file is still running: /var/log/nginx/access.log
2017-09-19T18:05:16+03:00 DBG Update existing file for harvesting: /srv/current/var/logs/prod.log, offset: 701936
2017-09-19T18:05:16+03:00 DBG Harvester for file is still running: /srv/current/var/logs/prod.log
2017-09-19T18:05:16+03:00 DBG Prospector states cleaned up. Before: 26, After: 26
2017-09-19T18:05:16+03:00 DBG Prospector states cleaned up. Before: 61, After: 61
Filebeat version: 5.4.1
Elasticsearch version: 5.5.1
I run elasticsearch in docker container docker.elastic.co/elasticsearch/elasticsearch:5.5.1
m.max_map_count=262144 is set
Approximately I send 20Mb/minute logs
Playing around with the settings didn't help
Any ideas where to dig?
Having the debug logs is the right way to go. Any chance you could share more of the debug log?
What is the rotation algorithm you are using? I assume the access logs are rotated and you want to finish read them. I would recommend you to specifiy something like access.log* to make sure also the rotated logs are read.
In the few log lines you provide above it states that 61 states are there but you only have 2 log lines active. So I assume the state is not fully cleaned up.
I'm not sure what is causing your issue yet but it's kind of strange that restarting ES sometimes helps and sometimes you need to restart FB.
Certainly I can produce more logs. From filebeat or elasticsearch?
Are you saying that Prospector states cleaned up. Before: 61, After: 61 means I have 61 file pointers opened but I have only 2 files? How can I completely clean them?
for /var/log/nginx/access.log I have rule in logrotate
for /srv/current/var/logs/prod.log I don't have rotation set up yet I just do $ cat /dev/null > /srv/current/var/logs/prod.log daily.
In addition I would like to say that everything worked well until we moved to production and started to send logs massively. My colleague noticed that if we remove pipelines from elasticsearch ingest modes logs keep sending longer.
2017-09-20T18:15:59+03:00 DBG Run prospector
2017-09-20T18:15:59+03:00 DBG Start next scan
2017-09-20T18:15:59+03:00 DBG Check file for harvesting: /srv/current/var/logs/prod.log
2017-09-20T18:15:59+03:00 DBG Update existing file for harvesting: /srv/current/var/logs/prod.log, offset: 75091
2017-09-20T18:15:59+03:00 DBG Harvester for file is still running: /srv/current/var/logs/prod.log
2017-09-20T18:15:59+03:00 DBG Prospector states cleaned up. Before: 63, After: 63
2017-09-20T18:16:03+03:00 ERR Failed to perform any bulk index operations: Post http://elastic:9200/_bulk: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2017-09-20T18:16:03+03:00 INFO Error publishing events (retrying): Post http://elastic:9200/_bulk: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2017-09-20T18:16:03+03:00 DBG send fail
2017-09-20T18:16:09+03:00 DBG Run prospector
2017-09-20T18:16:09+03:00 DBG Start next scan
2017-09-20T18:16:09+03:00 DBG Check file for harvesting: /var/log/nginx/access.log
2017-09-20T18:16:09+03:00 DBG Update existing file for harvesting: /var/log/nginx/access.log, offset: 11950635
2017-09-20T18:16:09+03:00 DBG Harvester for file is still running: /var/log/nginx/access.log
2017-09-20T18:16:09+03:00 DBG Prospector states cleaned up. Before: 30, After: 30
2017-09-20T18:16:09+03:00 DBG Run prospector
2017-09-20T18:16:09+03:00 DBG Start next scan
2017-09-20T18:16:09+03:00 DBG Check file for harvesting: /srv/current/var/logs/prod.log
2017-09-20T18:16:09+03:00 DBG Update existing file for harvesting: /srv/current/var/logs/prod.log, offset: 75091
2017-09-20T18:16:09+03:00 DBG Harvester for file is still running: /srv/current/var/logs/prod.log
2017-09-20T18:16:09+03:00 DBG Prospector states cleaned up. Before: 63, After: 63
2017-09-20T18:16:11+03:00 DBG ES Ping(url=http://elastic:9200, timeout=1m30s)
2017-09-20T18:16:11+03:00 DBG Ping status code: 200
2017-09-20T18:16:11+03:00 INFO Connected to Elasticsearch version 5.5.1
2017-09-20T18:16:11+03:00 INFO Trying to load template for client: http://elastic:9200
2017-09-20T18:16:11+03:00 DBG HEAD http://elastic:9200/_template/filebeat
2017-09-20T18:16:11+03:00 INFO Template already exists and will not be overwritten.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.