Filebeat 5.0 with multiline, split event data to two events

I am trying to use timeout: -1
In the Filebeat config file, I am getting the following message:

Exiting: Error in initing prospector: negative value accessing 'filebeat.prospectors.0.multiline.timeout' (source:'/opt/sw/filebeat/ymls/qa-app01/filebeat_applogs.yml')

Ori

Sorry, I think timeout: 0 should disabled timeouts.

but timeout will not protect you from weird buffering effects, as file pointer is already advanced once this happens.

I set it to be timeout: 0, It still splits events........

If I understand you correctly, split events means an events starting with all \u0000? You using network shares? NFS (via udp or tcp) or SMB?

Events being split normally shows in event starting right in the middle without all those zeros.

The \u0000 is not due to filebeat splitting events, it is due to the OS presenting all those zeros to filebeat on read-time, which might be caused by data being processed out of order.

Maybe one can implement a workaround in filebeat checking for the NULL character, backing of a little and retry reading in the hope the missing content has already been received.

Hi Steffens,

The split event, is not necessarily starting with \u0000.
It split in the middle of the message text.

I am using CIFS in linux to access log files on windows app servers.

Ori

did you disable the timeout? If so, you still see incomplete messages without NULL characters?

Yes, I set it to be timeout: 0

There are still splitted events. Why is that ?

Ori

Hi Ori

If I remember correctly, the hole issue started because you were running filebeat on Windows and it used too many resources. Then you started using network drives and reading the files from a windows machine which lead to many other interesting issues. As we are not in control on what exactly happens on shared network drives I would suggest that we get back to the windows resource issue instead of trying to debug issues related to network drives. If there is unreasonable high resource usage on Windows, I'm sure we can find a way to fix it. If strange things happen on network drives which don't happen on non network drives, I'm not so confident.

WDYT?

This topic was automatically closed after 21 days. New replies are no longer allowed.

Hi Ruflin,

I tried to play with the scan_frequency parameter, it did not help.
The problem is that we have 22 different log types on the same server.

Currnetly working with one dedicated WINDOWS machine as a Filebeat node, which connects to other WINDOWS machines using the FQDN such as: \app-server\logs\type1\type.log.*
seems to be working fine.
One execption is for multiline events, which gets splitted although setting timeout: 0

Ori

Any chance that we could get some memory profiling from your filebeat instances on windows to further investigate the memory issue?

Sorry for the delay, a lot of holidays.....

We got a decision to use one central machine to serve as the Filebeat server, which access the remote APP Servers using FQDN.
There are few problems with some logs, but I think it is related to the way they are managed and generated.
One exception is for the particular type which still being splited up several times, although I used timeout=0 in the config file.

I think I found when it happens....
Seems like the last event in the logfile.
The logfile is being renamed and a new log file is created.

file.log
file.log.1
file.log.2

The splitted event holds a marker of the end of the file (few rows which are being printed).
So probably not a timeout matter......

Why renaming a file can cause it ?
Filebeat should be able to overcome it.

Ori

In general renaming a file does not change the file handler. So the harvester itself doesn't even get notified that a renaming happened. This is the case for local disk. TBH I'm not sure if the handling is exactly the same for a mounted disk as the renaming command is happening "remotely" perhaps it could be that the file handler is closed (honestly I would have to dig here into the details of these file system to full understand if anything special happens). Do you see anything special in the filebeat log files like that the handler was closed or had an error?

Hi Ruflin,

Thanks for the reply.

  1. What should be the log level ?
    Is the default enough ?
  2. I have checked on the Resource Monitor which comes with windows,
    Also increased the CLOSE_INACTIVE value.
    on the Resource Monitor, I can see the filebeat holds the handles to all files, including renamed files.
    Is there a way to check if it is considered a new file after renaming it ?

I am accessing the files using FQDN:
\Server-Name\Share-Name\LogType1\app.log*

Filebeat runs on Windows Server, which accesses log files on remote Windows Servers.

Thanks,

Ori

Hi Ruflin,

I made some tests by myself.
Took on file with a multiline event and within a loop spilled its content into a larger log file.
after it reached a certain size, renamed the larger log file.

Created 4 such log files.
file.log -> file1.log.4
file.log -> file1.log.3
file.log -> file1.log.2
file.log -> file1.log.1

I tested the following scenarios:

  1. Letting the Harvester close the file, put the one event one more time and then rename the file
    All Events were sent to Elastic successfully.
    I could see in log file messages of detecting rename of a file
    but it was not there for all files.
  2. Letting the Harvester close the file, rename it and then put the one event one more time in the renamed file
    All Events were sent to Elastic successfully.
    I could see in log file messages of detecting rename of a file
    but it was not there for all files.
  3. Same as the first one, only that let the harvester some more time to close the file, and setting a random sleep between each write into the large log file
    All Events were sent to Elastic successfully.
    I could see in log file messages of detecting rename of a file
    This time I could find the message for all files.
    All data was from the file.log, except the 4 last events per loop, which is written to the file before it is being renamed, and the Harvester then picks it up
  4. Same as second one.
    All Events Except the 4 last ones were sent to Elastic successfully.
    I could see in log file messages of detecting rename of a file
    This time I could find the message for all files.
    The offset in the log file indicated that it did not recognized the last event.
    Only after a while, when restarted Filebeat, it sent the 4 last messages.
    I also tried to put some more events but did not help, only the restart of filebeat sent them all.

So, It seems that it does recognize files which are being renamed, also by accessing using FQDN.
What worries me is the last test, which I do not know why it did not recognize the files were updated.

I am using version 5.0.0 Alpha 5, will also try to use the 5.0.0 GA release.

Let me know if you are interested in the scripts and logs of my tests, and where I can place them.

Unfortunately, I wanted to find what is causing the filebeat not to send the last event (split it), but I could not find it with my tests.
So, we still encounter the problem, without any solution......

Ori

I wonder if close_inactive could also be a reason for events being split. On close the multiline buffer is flushed. If file is re-opened due to activity the offset from after last flush will be used.

I thought about it too.
I have increased its value, but still same thing occur.

Sometimes it just not sending some of the last lines of the last event in the file, as part of the last event, but sending them as an event, which cannot be parsed, due to lack of fields.

Ori

Hi,

I started to use the Filebeat 5.0.0 (Not Alpha or Beta or RC1).
I have started the filebeat with Log level of Debug,
Monitored the logs generated and data being inserted into Elastic.

In the log files, there are messages like the followings:

Line 62884: 2016-10-30T11:27:15+02:00 DBG File rename was detected: \qa-app02\Logs\AppLogs\RT-90-2_APP_9_0.log -> \qa-app02\Logs\AppLogs\RT-90-2_APP_9_0.log4495, Current offset: 1055911
Line 62885: 2016-10-30T11:27:15+02:00 DBG File rename detected but harvester not finished yet.
Line 166609: 2016-10-30T11:30:18+02:00 DBG File rename was detected: \qa-app02\Logs\AppLogs\RT-90-2_APP_9_0.log -> \qa-app02\Logs\AppLogs\RT-90-2_APP_9_0.log4495, Current offset: 1055911
Line 166610: 2016-10-30T11:30:18+02:00 DBG Updating state for renamed file: \qa-app02\Logs\AppLogs\RT-90-2_APP_9_0.log -> \qa-app02\Logs\AppLogs\RT-90-2_APP_9_0.log4495, Current offset: 1055911
Line 168590: 2016-10-30T11:30:26+02:00 INFO Non-zero metrics in the last 30s: libbeat.es.published_and_acked_events=233 libbeat.es.publish.read_bytes=2050 libbeat.es.publish.write_bytes=592747 filebeat.harvester.open_files=1 filebeat.prospector.log.files.renamed=1 registrar.states.update=236 filebeat.harvester.running=1 publish.events=236 registrar.writes=4 libbeat.publisher.published_events=233 registar.states.current=1 libbeat.es.call_count.PublishEvents=4 filebeat.harvester.started=1

Which means that it identifies files which are being renamed.

For the last couple of hours, there were not splitted events.
Will continue tracking and update......

Thanks,

Ori

1 Like