Filebeat needs to be restarted in order to send logs to logstash dynamically


(Abinay) #1

Hii I was first using logstash-forwarder to ship my logs to logstash but unfortunately
logstash- forwarder was not tailing the file. It used to start reading
the file from the beginning which was not a good architectue especially
in the situation when my log file can get too large . Even after
starting logstash-forwarder with -tail = true command in the terminal
logstash-forwarder used to only harvest the file but was not processing
the events . Thats why i switched to filebeat . Now the filebeat is
tailing the file when tail_files = true is done in the configuration
file but its not pushing the logs to logstash dynamically - meaning i
have to manually restart filebeat each and everytime so as to send the
logs from filebeat to logstash . I am really pissed of with such
behavior of filebeat . Further the window of logstash-forwarder was very
friendly as it used to tell how many events are getting processed and
all but when i run file beat using service file beat start it does not
let me know whether how many events it sent and it also tries only once
to connect with logstash unlike logstash-forwarder which continuously
keeps trying to connect to logstash server..............So please let me
know about this ASAP . We need to get production ready pretty soon


(Mark Walkom) #2

With all due respect, commenting in such a manner certainly won't endear you to those that likely to help, least of all the developers that write it for your use.

So you're saying that filebeat processes the file but doesn't send it to LS at all? How do you know it processes it?
What do the logs say? Have you tried to run FB with the debug options - -v -d "*"? What does your config look like?


(Abinay) #3

@warkolm no what I am saying is that filebeat is not sending the logs dynamically - with this I mean is that let just say I had 4 logs in my file to be processed . So when i will run my filebeat with " service filebeat restart " command it will send the last added logs to logstash .Now lets assume i added two more logs to the file . So technically filebeat must send these logs to logstash by its own because its running (that's how logstash forwarder does and I was happy with this feature of LSF).But unfortunately its not sending logs by its own . I have to restart filebeat by the command "service filebeat restart " to send the logs .


(Mark Walkom) #4

How do you know it processes it?
What do the logs say? Have you tried to run FB with the debug options - -v -d "*"? What does your config look like?


(Abinay) #5

It processes it for sure after restarting my filebeat as i can see my logs in elasticsearch . this is my configuration file
################### Filebeat Configuration Example #########################

############################# Filebeat ######################################
filebeat:

List of prospectors to fetch data.

prospectors:
# Each - is a prospector. Below are the prospector specific configurations
-
# Paths that should be crawled and fetched. Glob based paths.
# To fetch all ".log" files from a specific level of subdirectories
# /var/log//.log can be used.
# For each file found under this path, a harvester is started.
# Make sure not file is defined twice as this can lead to unexpected behaviour.
paths:
- /home/abhinay/Downloads/error.log.1
# - /var/log/*.log
# - c:\programdata\elasticsearch\logs*

  input_type: log

  # Optional additional fields. These field can be freely picked
  # to add additional information to the crawled log files for filtering
  fields:
     product: order_online
  #  level: debug
  #  review: 1

  # Set to true to store the additional fields as top level fields instead
  # of under the "fields" sub-dictionary. In case of name conflicts with the
  # fields added by Filebeat itself, the custom fields overwrite the default
  # fields.
  fields_under_root: true

  # Type to be published in the 'type' field. For Elasticsearch output,
  # the type defines the document type these entries should be stored
  # in. Default: log
  document_type: log

  tail_files: true

###############################################################################
############################# Libbeat Config ##################################

Base config file used by all other beats for using libbeat features

############################# Output ##########################################

Configure what outputs to use when sending the data collected by the beat.

Multiple outputs may be used.

output:

Logstash as output

logstash:
# The Logstash hosts
hosts: ["52.74.2.176:5000"]

# Optional TLS. By default is off.
tls:
  # List of root certificates for HTTPS server verifications
  certificate_authorities: ["/etc/pki/tls/certs/logstash-forwarder.crt"]

############################# Shipper #########################################

shipper:

logging:

Send all logging output to syslog. On Windows default is false, otherwise

default is true.

#to_syslog: true

Write all logging output to files. Beats automatically rotate files if rotateeverybytes

limit is reached.

#to_files: false

To enable logging to files, to_files option has to be set to true

files:
# The directory where the log files will written to.
#path: /var/log/mybeat


(Abinay) #6

NOTE: many lines(comments though) have been omitted because i can post only 5000 lines here


(Mark Walkom) #7

Just post the non-commented lines, or use gist/pastebin/etc and link :slight_smile:


(Abinay) #8

ok this is the non-commented file

filebeat:
  prospectors:

    -
      paths:
        - /home/abhinay/Downloads/error.log.1
      input_type: log
      fields:
         product: order_online
      fields_under_root: true
      document_type: log
  registry_file: /var/lib/filebeat/registry

      tail_files: true
output:
  logstash:
    hosts: ["52.74.2.176:5000"]
    tls:
      certificate_authorities: ["/etc/pki/tls/certs/logstash-forwarder.crt"]
shipper:
logging:
  files:
    rotateeverybytes: 10485760

(ruflin) #9

@abinary I edited your post so the config is readable. Please use 3 backticks in the future to paste code.

The tail_files config looks like it is not where it should be (under prospector). In general we recommend to run once with tail_files and then restart filebeat with the option disable to prevent any data loss in the future. It will continue at the last state of all the files it already discovered.

Please run the command recommend by @warkolm to see the debug output when no further lines are forwarded.


(Abinay) #10

@ruflin thanx for replying . So you mean that in the beginning only i need to keep tail_files = true and thereafter even if it will be kept tail_files = false , it will tail the file ????? . Can you please specify the full command suggested by @warkolm . Currently I am running filebeat by typing the following in the terminal (under super user privelege) -
service filebeat restart


(Steffen Siering) #11

which operating system are you running on? Why have the tailing optioned enabled by default? Actually task of logstas-forwarder and filebeat (codebase derived from logstash-forwarder) is to tail files. If the option is enabled, the content of a file will be ignored, the first time it's seen.

Do not use the init script to debug filebeat, but start filebeat itself. For example

$ filebeat -e -v -d '*' 

How often are your log files updated? There is an option ignore_older that ensure old files are not processed.


(Abinay) #12

@ruflin Unfortunately your strategy is not working . I first ran filebeat with tail_files = true and then changed tail_files = false and commented it too and then i restarted filebeat but logstash was not tailing the file . And because it was not tailing the file same log was getting pushed in elasticsearch making replication . This was the reason why i shifted to filebeat from logstash-forwarder. Do you have any other way ???


(Abinay) #13

I am using ubuntu 14.04 LTS . There is no specified time @steffens after which log files are updated . It can be anything- 1 second or 1 week .The command you specified is also not working . Any flag does not give any output . the only thing I see on the console is -

  • Restarting Sends log files to Logstash or directly to Elasticsearch. filebeat [ OK ]

(ruflin) #14

@abinary What do you mean by "logstash was not tailing the file"? Are you using filebeat and logstash to tail the files? And then you mention it was pushed to elasticsearch? Do you have more then one output enabled in filebeat? Or what was pushing the data into elasticsearch?

About the command above by at @urso: Did you use the filebeat binary directly? You should add the -c flag and point it to your config file.


(Abinay) #15

@ruflin No @ruflin I am only using Logstash to take logs from Filebeat . See this is my architecture - I have Filebeat installed on my local system (which will later be installed on some server) . The Filebeat on my local system takes the logs from a file (as specified above in my Filebeat config) and ships them to Logstash. The Logstash parses the logs shipped by Filebeat to it and sends it to Elasticsearch . The logs in Elasticsearch can then be seen by kibana . Now what I want is that that file should be tailed by Filebeat - meaning only the recent updates of logs in the file should be sent by Filebeat to logstash . For Eg. let just say my file earlier had 4 logs and Filebeat shipped those logs to logstash . Now after one hour there was an addition of 3 more logs to the file (I do this by copying the logs from above 4 logs and changing the time and content of the log) . So what I need is that Filebeat must only send these 3 added logs to the Logstash not the whole file with 7 logs . What is happening is that when I am doing tail_files = false in my Filebeat config , Filebeat sends the whole file which results in pushing of same log more than once whenver update is there in my file . So this will be unnecessarily taking extra space . Thats why I turned tail_files = true . But the problem after doing is that if updates are added in the file , Filebeat will not send updates on its own . It needs to be restarted by typing the command "sudo service filebeat restart " . Then only it will send the updates (only 3 added logs to Logstash) . But I want automation . I think now you have understood my problem clearly @ruflin ?? let me know if you still want some more elaboration


(Steffen Siering) #16

@abinay your description matches exactly how filebeat is supposed to work. That's why I've been asking about debug output.

Only reason for filebeat to send complete file is again, if some file identifier e.g. inode changes randomly... which might happen with shared filesystem (e.g. samba/nfs).

If files are not updated very often, try to increase ignore_older.


(Abinay) #17

@steffens But filebeat must send the logs by itself to Logstash isn't it ??? This is the basic functionality to which every shipper must adhere , don't you think so ?? Logstash forwarder continuously keeps trying to connect to Logstash if Logstash is not running . And when it gets connected to Logstash , it keeps sending the logs (though the whole log file and not tailed one) whenever it sees any change in the log file . So this very thing is missing in Filebeat . Further Filebeat does not have a very friendly console . With Logstash i was able to see how many events it was processing , harvesting etc. But with Filebeat I only see-

Restarting Sends log files to Logstash or directly to Elasticsearch. filebeat [ OK ]

and nothing else . I was satisfied with Logstash Forwarder in every aspect except the fact that it does not tail the file . And now when I have found the tailing of file functionality in Filebeat , Filebeat does not send the logs dynamically . Unfortunately I am just wandering with the shippers..........................................


(Steffen Siering) #18

This is exactly the basic functionality filebeat provides. That's why we're asking for debug output by running filebeat with -v -d '*'. Filebeat detects connection to logstash being down and reconnects!

Have you tried to enable info log level? The default is log level is error. Running filebeat with -v sets log level to info.

What's your concept of tailing a file? Both, Logstash Forwarder and Filebeat are designed to basically tail a file starting at the beginning of the fail (but logs should be send dynamically). If the tail option is enabled (both in Logstash Forwarder and in Filebeat), the content already present in a file is actually skipped and tailing starts from end of file. This is actually a very niche case, making me wonder why it is required. What's your setup.

Questions:

  1. which filebeat and logstash versions are you using?
  • is file available local to filebeat on same machine or some shared/network filesystem in use (speak NFS/samba)?
  • at which rate are your log files updated?
  • have you tried to run filebeat in debug mode with console output? If so, can you share debug output with us?
  • have you already updated your config file? @ruflin clearly pointed out an error in the config file.
  • why is tail_files=true required?

(Abinay) #19

@steffens See this is how I am running filebeat . I installed it and then I am running it by typing the below command -

sudo service filebeat restart

And the output is -

  • Restarting Sends log files to Logstash or directly to Elasticsearch. filebeat
    [OK]
    Even running it like -

sudo service filebeat restart -v -d '*'

gives the same output as above.
Let me know if I should start my filebeat with some other command or something .
Now with tailing I mean only recent addition of logs should be send by Filebeat to Logstash . Refer to the example i quoted above .I don't want to tail the log file from the beginning . It will simply be an overhead . I want to tail the file from the most recent offset . I think now you have understood what I mean by tailing the file . Now to your questions -

  1. my filebeat version is - filebeat version 1.0.1 (amd64)
  2. Yes the log file is available to the same machine on which Filebeat is present .
  3. There is no specified time period of file update .
  4. As specified above i only see the below output when i run "sudo service filebeat restart -v -d '*'"
  5. yes there are no errors in config file
  6. I am saying this time and again that tail_files = true is required to only send the changes from the most recent offset of the log file .

(ruflin) #20

Filebeat is built on top of logstash-forward so it provides the same functionality and more. As @steffens already described, what you describe under number 6 is the core feature of filebeat. It only sends every line only once even if files are rotate with logrotation. No tail_files is required in your case. As tailing files works for all the users I know so far, we would like to understand what is going wrong in your case. For this we need the debug output.

The command you run is starting the service. What we need is that you run your binary directly. Assuming that you installed filebeat with the debian package, this would be: /usr/bin/filebeat -c pathtoyourconfig -e -d "*"

After running this please and making your updates to the file, please provide us with the output so we can help you further.