Filebeat Stop Cleaning Registry


(Gabry993) #1

Hi,
I've got a problem related to log rotation and registry clean up.
This is how the rotation works: after reaching a certain size, the file "logxxx.log" get renamed in "logxxx-timestamp.log" and a new "logxxx.log" is created.
This is my config (I've tried both alpha5 and alpha6 nightly build on a Windows Server 2000)

-input_type: log

Paths that should be crawled and fetched. Glob based paths.

paths:
#- /var/log/.log
- d:Logger\logxxx.log
exclude_lines: ["^\s
$"]
include_lines: ["^([a-z]|[A-Z])"]
fields:
catena: 11
fields_under_root: true
ignore_older: 30m
close_inactive: 2m
clean_inactive: 40m
document_type: log_xxx

During the day those file are written very often and everything seems fine with my config: the registry get cleaned and in task manager I can see that filebeat mem usage remains constant.
However, during the night, when log files get updated seldomly, this line appears many times in filebeat log files:

ERR File is falling under ignore_older before harvesting is finished. Adjust your close_* settings: d:Logger\logxxx.log

This message is written until morning, when our log files are return to be updated every few seconds, then it dissapears. Although that, now the registry keeps growing bigger and bigger and it seems that in can't be cleaned. If I delete the registry manually, filebeat starts working correctly until the night and so on...
Am I missing something with the config? Should I try setting ignore_older to a value even bigger compared to clean_inactive?

Moreover, is there a parameter to get info about the registry and file handler from the log which is not debug?
Thank you!
Gabriele


(ruflin) #2

Hi Gabriele

There was a bug in the nightly builds of the last 48 hours that prevented clean_inactive to work correctly. It could be that you fetched exactly one of this versions. It is not fixed in the most recent snapshot. Could you try again?

About your config above: It looks like the indentation is kind of messed up. Is that just here or does your config file look that way?

To make sure files don't stay open, it is important that events can be shipped fast enough. Which output are you using? How many events per second (log lines) do you have?


(Gabry993) #3

Hi,
about the bug: if "it is not fixed" which version should I try? Or did you mean that "it is now fixed" so I can try the latest nightly? However, I had the same issue also on alpha5.

About the config: my fault, I am not good with "blockcode", but the indentation is correct in my config file.

I am sending events to 3 logstash instances with load balance set to true.
The events per second may vary during the day, but it should be 100 lines per second for each file, so 700 lines per second (we are testing only one logger).

Thank you!


(ruflin) #4

Sorry for the typo. I meant: It is now fixed. Could you try the nightly snapshot?

The strange part is that you report this happens, when the log files are not really udpated and it returns back to normal when you have a higher traffic. I would expect the opposite.


(Gabry993) #5

I would expect the opposite too.
I wonder why when it returns back to normal it also stops cleaning the registry, What should I see in the log file when registry gets cleaned? Is it info or debug level?
Now I am trying the nightly, I will update you!
Thank you


(ruflin) #6

On the debug level there is a message that tell you how many states were cleaned up. There is currently nothing on info. But one thing we introduced is that every 30s a summary is logged to info about the changes. If lots of states are removed, you should also see there a value changing. The second part I didn't double check.

I hope the latest nightly fixes the issue.


(Gabry993) #7

Hi,
I've let it run all the weekend and this morning the registry size is only 2KB so I think that filebeat is cleaning it.
However, I still find

ERR File is falling under ignore_older before harvesting is finished. Adjust your close_* settings: d:Logger\logxxx.log

during the night. In this moment it does not seem a problem, but I can't figure out why it occurs.

Thank you again for your support!


(Gabry993) #8

Hi,
the

ERR File is falling under ignore_older before harvesting is finished. Adjust your close_* settings: d:Logger\logxxx.log

still occurs during night, however we solved the problem with the growing registry: we have increased the --pipeline-batch-size in logstash and the flush_size in the elastic output. Now everything seems ok during the day, but we found 2 type of error which I can't understand.
We got many occurrences of:

ERR Failed to publish events caused by: EOF
INFO Error publishing events (retrying): EOF

And:

ERR Failed to publish events caused by: read tcp ip:1157->ip:port: i/o timeout

Are they serious problem? I can't understand it as log messages are correctly sent to logstash and elasticsearch.


About clean_removed in Filebeat 5.0.0-alpha5
(ruflin) #9

The first error is shown, in case a file did not finish harvesting yet but already falls under ignore_older. This can for example happen when a file is not updated anymore, but the harvester can't finish reading it as the output is not available or too slow. As soon as the output catches up, the file will be closed and ignored.

This brings us to the second errors: There seem to be some issues with the output which explains the first issue. As filebeat keeps retrying sending log messages on errors and you get all messages, it seems like these are some temporary connection issues.

Do you get any errors on the Logstash side?


(Gabry993) #10

I was not able to get logstash logs until today!
I get:

{:timestamp=>"2016-09-02T07:34:57.774000+0000", :message=>"Beats input: The circuit breaker has detected a slowdown or stall in the pipeline, the input is closing the current connection and rejecting new connection until the pipeline recover.", :exception=>LogStash::Inputs::BeatsSupport::CircuitBreaker::HalfOpenBreaker, :level=>:warn}

Edit: also

{:timestamp=>"2016-09-02T07:39:30.252000+0000", :message=>"Beats input: the pipeline is blocked, temporary refusing new connection.", :reconnect_backoff_s
leep=>0.5, :level=>:warn}
{:timestamp=>"2016-09-02T07:40:15.443000+0000", :message=>"CircuitBreaker::rescuing exceptions", :name=>"Beats input", :exception=>LogStash::Inputs::Beats
::InsertingToQueueTakeTooLong, :level=>:warn}

Does this mean that logstash can't handle all the requests? Should I reduce the --pipeline-batch-size and the flush_size?


(system) #11

This topic was automatically closed after 21 days. New replies are no longer allowed.