Filebeat appears to drop files randomly

Hello,

I am running filebeat on a server where my script is offloading messages from a queue as a individual files for filebeat to consume.
The setup works fine, but every now and then i see the no. of files doesn't match the no. of records on kibana. Where are these missing files going? Is filebeat dropping them?

Here's some more detail.

I have MQ queues which have messages. I wrote a script to offload these messages, each message as a single file. I am running filebeat to pick these files up. In my testing i am seeing, if there are 300 messages(hence 300 files), in kibana it only shows as 299 records sometimes and 294 sometimes. It is not getting all the 300 files. ANother thing i noticed is that if i make my script more messages, say a 1000(hence 1000 files) , the files appear to be broken or only show partial messages in kibana.

Is there a setting i am missing to ensure all files are picked up? Is the issue in filebeat or elasticsearch? I did a bit of troubleshooting where it was showing 299 records instead of 300, and went through my filebeat.log to check the names of all the files that were picked up. It missed one file.
What am i doing wrong? The files are not big. About 1100 bytes each.

What's the file format of your files you want to index? Is it plain log files with newline characters as separator or something more specialised?

Can you share your filebeat configuration?

You have samples (names) of files missing? Have you checked the registry file, if file is indeed missing?

Are files deleted at some point in time?

Filebeat tries to publish events as fast as possible, but will get backpressure from the downsink system (in this case Elasticsearch). Once the internal queue are full, filebeat will block until some more events have been published.

Hi steffens,

The file format is xml.

This is the file content.

 * DMPMQMSG Version:8.0 Created:Tue Sep  5 14:49:16 2017
 * Qmgr  = LOGD1
 * Queue = TEST.QUEUE

N
T <LogRecord version="1.0"><trackingID>07020701050309020902020600000204</trackingID><originationTimestamp>2017-07-27T15:39:29.225-05:00</originationTimestamp><firstCallingProgram>DXCG9092</firstCallingProgram><sourceProgram>Product ID Component</sourceProgram><serviceInstance>DEVL</serviceInstance><serviceFunctionalArea>ProductID</serviceFunctionalArea><messageName>findProduct</messageName><messageVersion>4.0</messageVersion><userId>DS60024</userId><tier1ReturnStatus> 0</tier1ReturnStatus><tier2NameSpace>General</tier2NameSpace><tier2MessageNumber>1</tier2MessageNumber><tier2MessageText>Successful</tier2MessageText><tier3ProgramName></tier3ProgramName><tier3MessageCode></tier3MessageCode><tier3MessageText></tier3MessageText><currentDateTime>2017-07-27T20:39:29.357Z</currentDateTime><eventType>s</eventType><applicationSupportGroup>Integration</applicationSupportGroup><loggingProgramName>DEV_ServiceFlow-Find</loggingProgramName></LogRecord>


###################### Filebeat Configuration Example #########################

# This file is an example configuration file highlighting only the most common
# options. The filebeat.full.yml file from the same directory contains all the
# supported options with more comments. You can use it as a reference.
#
# You can find the full configuration reference here:
# https://www.elastic.co/guide/en/beats/filebeat/index.html

#=========================== Filebeat prospectors =============================

filebeat.prospectors:
- input_type: log

  # Paths that should be crawled and fetched. Glob based paths.
  paths:
    - /tmp/IAF*


  # Exclude lines. A list of regular expressions to match. It drops the lines that are
  # matching any regular expression from the list.
  exclude_lines: ["DMPMQMSG|^N|Queue|Qmgr"]
  #close_eof: true

#================================ General =====================================

# The name of the shipper that publishes the network data. It can be used to group
# all the transactions sent by a single shipper in the web interface.
#name:

# The tags of the shipper are included in their own field with each
# transaction published.
#tags: ["service-X", "web-tier"]

# Optional fields that you can specify to add additional information to the
# output.
#fields:
  env: devl

#================================ Outputs =====================================

# Configure what outputs to use when sending the data collected by the beat.
# Multiple outputs may be used.

#-------------------------- Elasticsearch output ------------------------------
output.elasticsearch:
  # Array of hosts to connect to.
  hosts: ["10.204.16.105:9200"]

  # Optional protocol and basic auth credentials.
  #protocol: "https"
  #username: "elastic"
  #password: "changeme"
index: "filebeat"

#================================ Logging =====================================

# Sets log level. The default log level is info.
# Available log levels are: critical, error, warning, info, debug
logging.level: debug
logging.to_files: true
logging.files:
  path: /var/log/filebeat
  name: filebeat.log
  keepfiles: 7


# At debug level, you can selectively enable logging only for some components.
# To enable all selectors use ["*"]. Examples of other selectors are "beat",
# "publish", "service".
#logging.selectors: ["*"]

I checked the filebeat log to see how many or what files were harvested. It clearly missed one file that was in the /tmp directory.
Yes. The files are deleted. My script cleans up the directory before putting new files in there.

Please let me know if you need any more info.

The file format is xml.

Is it one XML document per line?

Multiple documents in one file?

Does an event/document always end with newline?

Yes. The files are deleted. My script cleans up the directory before putting new files in there.

have you checked indexing rates in Elasticsearch? Maybe files get deleted before filebeat can process them due to backpressure by Elasticsearch?

At which rates do you delete and write files?

It clearly missed one file that was in the /tmp directory.

What's the exact filename? Have you checked registry and filebeat logs for said file?

How many files have you present at once? Any errors or interesting info messages in the filebeat logs? E.g. are you running out of file descriptors?

Yes. One XML document in one file. One XML line in the file. I am excluding the first 4 lines. Yes, the document always ends with newline.

Can you tell me how to check indexing rates? I am generating the index stats with the API but don't really see an issue. I tried my script in 15 min intervals and still see files being missed.
The first thing my script does is clean up the directory by deleting the old files and write new files in the directory.
I checked the file logs for the file and see the file was harvested. I see all the files being harvested. Just doesn't show up in kibana. My logs are relatively clean. Nothing in the logs except files opened for harvesting and closed at EOF. I can try setting up debug and see if i can catch anything.

My guess is, filebeat is harvesting all the files, but elasticsearch is dropping them due to some threshold settings.

I tried this again with multi line options set. I am using the following multiline options.

multiline.pattern: '^T <LOG'
multiline.negate: true
multiline.match: after

And i am trying with 8582 files(no. of messages on my queue). But when i look at the index stats, i only see 8488 in the doc count.

I ran filebeat with the loglevel at debug and still nothing to indicate an issue. A quick grep and wc -l on the filebeat shows me all 8582 files are being harvested.

Yes. One XML document in one file. One XML line in the file. I am excluding the first 4 lines. Yes, the document always ends with newline.

Given these information I would say you don't need multiline.

I checked the file logs for the file and see the file was harvested. I see all the files being harvested. Just doesn’t show up in kibana. My logs are relatively clean. Nothing in the logs except files opened for harvesting and closed at EOF.

Have you also checked the filebeat registry file (it's pure JSON) for all files being present and offsets pointing to end of file (offset > 0)?

Is there a chance all content in a file being filtered out by your exclude_lines setting?

My guess is, filebeat is harvesting all the files, but elasticsearch is dropping them due to some threshold settings.

No idea wether Elasticsearch silently drops documents. Normally some error code is returned and filebeat will retry until all events get a 200 OK response.

Could you try to use file output and see if you see the same issue? Like this we can remove and movable part and see if it is on the filebeat side or more output / ES related.

How exactly do you write the files? Could you share this code?

Is there a reason you don't use close_eof? As far as I understood, your files are written to only once.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.