Filebeat - High CPU for windows machines

Hi,

We are noticing the Filebeat consuming a lot of CPU on windows monitored machines.
It can be above 20% and also above 40% all the time.

The Filebeat suppose to be a light program which should not be felt at all.
What can be done to lower its impact on the machines ?

Thanks,

Ori

Can you share some more details? Config file? Filebeat version? OS? Events per Second? Setup?

Hi Ruflin,

Filebeat version is 1.2.3
Filebeat server is Windows 2008 R2
Filebeat is sending logs to Logstash on a RedHat Enterprise linux Server.
On the same server there is an Elasticsearch Instance installed.
Elastic + Logstash are Version 2.1.1

Max events Per second was 1255 events.
A lot of processing work is being done by the Logstash instance.
On the index level, we also using nGram analysis.

The filebeat config file, holds 22 Different prospectors.
These are different log files on the same machine, in which we are distinguish by type of log file.
Some are multiline, but it is being handled on the logstash side, not yet on the filebeat side.

Thanks,

Ori

My first idea was that multiline could cause the high cpu but as you wrote that happens on the LS side. Can you share your filebeat config file to see if there are perhaps some other potential issues? What is your scan frequency?

Hi Ruflin,

Scan frequency is the default value.
I shared the config file in the following:

Last two posts by me.

Thanks,

Ori

Hi Ori

I finally also responded on the other thread. I'm curious if the high CPU usage could be related to the number of files you are harvesting and that you potentially hit a limit there? How many files do you have know harvesting at the same time? Did you have the same issues with 5.0?

Hi Ruflin,

I am currently only on DEV stage with Version 5, since I would like to switch Logstash with the Ingest node.

Each prospector is defined to work against a certain folder with one active log file.
Of course that there can be a rotation, and within a peak hour, there can be few still active for the same prospector.
However, in the config, there is a parameter of close older with 5m value.

  1. What is the best way to check it ?
  2. I think that also the mechanism which track last modified date for files listed in the registry which are still present in the folder, has also something to do with it.

Ori

Do you see the issue also no your Dev machines with Filebeat 5.0 builds?

  1. What is the best way to check it? To check what? How many files are harvested? How many states are there? Best way is the log files. If you enable debug level, all the information and details should be there. Logging has also been improved for 5.0 and already with INFO level the necessary infos should be there. Every 30s infos are printed out?

  2. Not 100% sure I understand this. Are you referring to how often the registry file is written? What is the size of your registry file?

Hi Ruflin,

Not yet able to know regarding Version 5.0

For section 2, I meant that maybe the process to check if files has been updated for the last IGNORE_OLDER value is consuming also high amount of CPU.

Ori

Let me know as soon as you have more information regarding 5.0. I'm still hoping it does not happen there.

For 2: I would hope this is not the case. Lets wait until we know more about 5.0 and dig deeper then.

Thanks Ruflin!!!!

One more thing, with Filebeat 5 Alpha 4, The filebeat is not releasing it's lock on the file after CLOSE_INACTIVE is reached.
I tryied to delete it, but could not, also tested with CLOSE_REMOVED and still could not delete it, the lock was still there, and so was the file.

Ori

Could it be that somehow the file is still harvested? Did you see somewhere a message like couldn't remove state because not Finished? close_removed will only apply after the file is removed. So you should be able to remove the file and then the handler is removed.

What do you mean by you couldn't delete it?

Can you make sure to test alpha5 or best the nightly build as several improvements were done since alpha4? https://beats-nightlies.s3.amazonaws.com/index.html?prefix=filebeat/

Parameter CLOSE_INACTIVE is replacing CLOSE_OLDER, am I right ?

I waited the CLOSE_INACTIVE time, and after that deleted the file, It was still in the folder.
After stopping the Filebeat, the file disappeared from the folder.
No extra data was written to the file... So it was inactive.

Ori

Yes, close_inactive replaces close_older. close_inactive is the time since the harvester was last reading a line (which is not necessarly the update time. Can you check the logs if you had some messages inside that the file was closed (or not)? Debug mode is probably needed for that.

Hi Ruflin,

No Close message....

It ends with:

2016-09-08T15:39:25+03:00 DBG output worker: publish 503 events
2016-09-08T15:39:25+03:00 DBG output worker: publish 503 events
2016-09-08T15:39:25+03:00 DBG ES Ping(url=http://localhost:9200, timeout=1m30s)
2016-09-08T15:39:25+03:00 DBG Ping status code: 200
2016-09-08T15:39:25+03:00 DBG PublishEvents: 503 metrics have been published to elasticsearch in 561.6ms.
2016-09-08T15:39:25+03:00 DBG send completed
2016-09-08T15:39:25+03:00 INFO Events sent: 504
2016-09-08T15:39:25+03:00 DBG Processing 504 events
2016-09-08T15:39:25+03:00 DBG New state added for C:\Program Files\filebeat\input\micrositehits.log
2016-09-08T15:39:25+03:00 DBG Write registry file: c:/Program Files/Filebeat/registry/filebeat_micrositehits.registry
2016-09-08T15:39:25+03:00 INFO Registry file updated. 1 states written.
2016-09-08T15:39:25+03:00 DBG delete old: remove c:/Program Files/Filebeat/registry/filebeat_micrositehits.registry.old: The system cannot find the file specified.
2016-09-08T15:39:25+03:00 DBG rotate to old: rename c:/Program Files/Filebeat/registry/filebeat_micrositehits.registry c:/Program Files/Filebeat/registry/filebeat_micrositehits.registry.old: The system cannot find the file specified.
2016-09-08T15:39:27+03:00 DBG End of file reached: C:\Program Files\filebeat\input\micrositehits.log; Backoff now.
2016-09-08T15:39:30+03:00 INFO Run prospector
2016-09-08T15:39:30+03:00 DBG Start next scan
2016-09-08T15:39:30+03:00 DBG Check file for harvesting: C:\Program Files\filebeat\input\micrositehits.log
2016-09-08T15:39:30+03:00 DBG Update existing file for harvesting: C:\Program Files\filebeat\input\micrositehits.log, offset: 309242
2016-09-08T15:39:30+03:00 DBG No updates needed, file C:\Program Files\filebeat\input\micrositehits.log is already harvested.
2016-09-08T15:39:30+03:00 DBG Flushing spooler because of timeout. Events flushed: 0
2016-09-08T15:39:35+03:00 DBG End of file reached: C:\Program Files\filebeat\input\micrositehits.log; Backoff now.
2016-09-08T15:39:37+03:00 DBG Flushing spooler because of timeout. Events flushed: 0

And then repeatedly writes:

2016-09-08T15:39:40+03:00 INFO Run prospector
2016-09-08T15:39:40+03:00 DBG Start next scan
2016-09-08T15:39:40+03:00 DBG Check file for harvesting: C:\Program Files\filebeat\input\micrositehits.log
2016-09-08T15:39:40+03:00 DBG Update existing file for harvesting: C:\Program Files\filebeat\input\micrositehits.log, offset: 309242
2016-09-08T15:39:40+03:00 DBG No updates needed, file C:\Program Files\filebeat\input\micrositehits.log is already harvested.
2016-09-08T15:39:45+03:00 DBG Flushing spooler because of timeout. Events flushed: 0
2016-09-08T15:39:45+03:00 DBG End of file reached: C:\Program Files\filebeat\input\micrositehits.log; Backoff now.

Ori

Same with Version 5 Alpha 5.

Looks like you are hitting an old bug you had in version 1.2.2 If I am not wrong......

Ori

OK, at first seems to be the same with alpha5,

But found:

2016-09-08T15:55:17+03:00 INFO Non-zero metrics in the last 30s: filebeat.harvester.open_files=-1 publish.events=1 registrar.state_updates=1 filebeat.harvester.closed=1 filebeat.harvester.running=-1

It looks ok now.

In a second test, found:

2016-09-08T16:46:01+03:00 INFO Read line error: file inactive
2016-09-08T16:46:01+03:00 DBG Stopping harvester for file: C:\Program Files\filebeat\input\micrositehits.log
2016-09-08T16:46:01+03:00 DBG Stopping harvester, closing file: C:\Program Files\filebeat\input\micrositehits.log
2016-09-08T16:46:01+03:00 DBG Update state: C:\Program Files\filebeat\input\micrositehits.log, offset: 309242
2016-09-08T16:46:06+03:00 DBG Flushing spooler because of timeout. Events flushed: 1

So it is now stopping to harvest the file.

Thanks!!!

That looks better :slight_smile:

One think that started worrying me when I was reading your log lines is this:

2016-09-08T15:39:25+03:00 DBG delete old: remove c:/Program Files/Filebeat/registry/filebeat_micrositehits.registry.old: The system cannot find the file specified.

It seems like the registry file can't be rewritten properly? Is that potentially because of manual edits?

Since I am testing, I am deleting all registry files before each run.
So it can be related.....

Ori

You remove it before start and do not edit it during running? If yes, it should not have any effect :frowning: As it does not always appear, I assume permissions are as expected? On shutdown, the registry file has the correct content?