How to send only the newly added log events instead of the entire content of a log file?

Hi,

I have a log file that need to be sent to logstash using filebeat. My log file of size ~500 MB. Whenever a new event is added to the log file, filebeat is sending the whole log file to logstash. I am interested in only sending the new events to the logstash.

I

That does not sound right, as file beat default settings would do as you describe how you want it to work. It would only send the differences of the file , assuming the file is not being completely re-written every time a new event is logged.

Can you provide your config?

This is my configuration file.

filebeat:
  prospectors:
    -
      paths:
        - /var/www/zap-daemon/log/zap-daemon.log
        - /var/www/zap-daemon/log/test.log
      input_type: log
      tail_files: true
  
  registry_file: /var/lib/filebeat/registry

output:
  ### Logstash as output
  logstash:
    # The Logstash hosts
    hosts: ["172.31.59.92:5044"]
    bulk_max_size: 2048
    index: gui
    tls:
      # List of root certificates for HTTPS server verifications
      certificate_authorities: ["/etc/pki/tls/certs/logstash-forwarder.crt"]
logging:

  files:
    rotateeverybytes: 10485760 # = 10MB

Another thing happening here is, initially when i push these logs to logstash, kibana is displaying the numer of hits as 117,968. But after adding a single line to the log file, kibana is showing the number of hits as 117,968+117,969. So when searching for a particular log entry, thing are being duplicated and i am getting the wrong count in a particular amount of time.

Remove tail_files: true and try again.

HI @andrewkroh,

I tried that option as well, by removing the tail_files: true. Still i am getting the same result.

  • How is your log file written? Could it be that your logging tool creates a new file?
  • Could you post the log output from filebeat here?

Hi @ruflin,

I have manually added one line to the log file, to understand the visibility in Kibana.

I do not see any log file for the filebeat service itself. Could you please tell me which output exactly you need to better understand my problem.

Here is the output from the command filebeat -e -d publish

2017/03/20 12:50:45.620875 geolite.go:24: INFO GeoIP disabled: No paths were set under output.geoip.paths
2017/03/20 12:50:45.621578 logstash.go:106: INFO Max Retries set to: 3
2017/03/20 12:50:45.704197 outputs.go:126: INFO Activated logstash as output plugin.
2017/03/20 12:50:45.704254 publish.go:232: DBG Create output worker
2017/03/20 12:50:45.704299 publish.go:274: DBG No output is defined to store the topology. The server fields might not be filled.
2017/03/20 12:50:45.704390 publish.go:288: INFO Publisher name: ip-172-31-63-75
2017/03/20 12:50:45.704533 async.go:78: INFO Flush Interval set to: 1s
2017/03/20 12:50:45.704566 async.go:84: INFO Max Bulk Size set to: 2048
2017/03/20 12:50:45.704589 async.go:92: DBG create bulk processing worker (interval=1s, bulk size=2048)
2017/03/20 12:50:45.704634 beat.go:168: INFO Init Beat: filebeat; Version: 1.3.1
2017/03/20 12:50:45.705223 beat.go:194: INFO filebeat sucessfully setup. Start running.
2017/03/20 12:50:45.705274 registrar.go:68: INFO Registry file set to: /var/lib/filebeat/registry
2017/03/20 12:50:45.705343 prospector.go:133: INFO Set ignore_older duration to 0s
2017/03/20 12:50:45.705371 prospector.go:133: INFO Set close_older duration to 1h0m0s
2017/03/20 12:50:45.705393 prospector.go:133: INFO Set scan_frequency duration to 10s
2017/03/20 12:50:45.705415 prospector.go:93: INFO Input type set to: log
2017/03/20 12:50:45.705438 prospector.go:133: INFO Set backoff duration to 1s
2017/03/20 12:50:45.705467 prospector.go:133: INFO Set max_backoff duration to 10s
2017/03/20 12:50:45.705495 prospector.go:113: INFO force_close_file is disabled
2017/03/20 12:50:45.705520 prospector.go:143: INFO Starting prospector of type: log
2017/03/20 12:50:45.705612 log.go:115: INFO Harvester started for file: /var/www/zap-daemon/log/test.log
2017/03/20 12:50:45.705727 spooler.go:77: INFO Starting spooler: spool_size: 2048; idle_timeout: 5s
2017/03/20 12:50:45.705776 log.go:115: INFO Harvester started for file: /var/www/zap-daemon/log/zap-daemon.log
2017/03/20 12:50:45.705874 crawler.go:78: INFO All prospectors initialised with 2 states to persist
2017/03/20 12:50:45.705915 registrar.go:87: INFO Starting Registrar
2017/03/20 12:50:45.705948 publish.go:88: INFO Start sending events to output

Is this file being updated? Check the ownership of the file and compare it with who is running filebeat
registry_file: /var/lib/filebeat/registry

This is the initial state of the registry file .

{"/var/www/zap-daemon/log/test.log":{"source":"/var/www/zap-daemon/log/test.log","offset":1036,"FileStateOS":{"inode":530079,"device":51713}},"/var/www/zap-daemon/log/zap-daemon.log":{"source":"/var/www/zap-daemon/log/zap-daemon.log","offset":13854305,"FileStateOS":{"inode":530077,"device":51713}}}

I made changes to second file, i.e., /var/www/zap-daemon/log/zap-daemon.log.
This is the registry file, after i manually added a log entry to the log file.

{"/var/www/zap-daemon/log/test.log":{"source":"/var/www/zap-daemon/log/test.log","offset":1036,"FileStateOS":{"inode":530079,"device":51713}},"/var/www/zap-daemon/log/zap-daemon.log":{"source":"/var/www/zap-daemon/log/zap-daemon.log","offset":13854322,"FileStateOS":{"inode":530073,"device":51713}}}

This file is owned by the "root" user.

Still i am getting the whole log file contents in kibana and when i am searching for a particular field, thing are duplicating.

What is the command you used to add a line to the log file?

The log output you posted above is what I'm also interested in, but it stops directly before we see some metrics. Please wait at least 30s until you see a metric entry. Best use a gist to paste it in and then link it here.

I am using vi editor to add a line to the logs.

Here is the output from the above command.

Here is the output of the filebeat log when nothing is added to the log file.

https://gist.github.com/anonymous/fe9afc51533564c81c9f31e0915ddb6c

I followed the instructions in this topic.

As mentioned in the above topic, when we use echo to add a line to the log file, the problem is solved. When vi editor is used, the whole file is being shipped to the Elasticsearch.

Use echo to add lines to the log file. That solves the issue.

I can confirm this behavior. I think it is because after you edit the file with vim it generates a new inode id. Here is the output of my registry

[{"source":"/home/martin/Dokumente/test.txt","offset":24,"FileStateOS":{"inode":278184,"device":2049},"timestamp":"2017-03-21T13:24:03.635574722+01:00","ttl":-1},{"source":"/home/martin/Dokumente/test.txt","offset":30,"FileStateOS":{"inode":278217,"device":2049},"timestamp":"2017-03-21T13:24:28.728930547+01:00","ttl":-1}]

As you can see inode is different. First i changed the file 4 times with echo. And then with vim

I'm not the linux expert but here is an article about. http://unix.stackexchange.com/questions/36467/why-inode-value-changes-when-we-edit-in-vi-editor/37177

Thanks for the confirmation and link.

:see_no_evil: I see that is answered in the logstash topic

@maddin2016 Very interesting to know. I wasn't aware vim had this option.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.