Is the output able to keep up with the message you are sending? Could you share some log files?
If the output is not keeping up, the reasons for the growing memory is probably publish_async as the number of items it needs to keep in memory over time is growing. Be aware that this is an experimental feature.
Please also try to update to the most recent version of filebeat if possible for testing.
The output(elasticsearch) use about 12 of 24 cpu cores, through the load of elasticsearch is high, but should still keeping up. I will update filebeat version later.
As you see in the log file, there are some issue when sending data to elasticsearch. I strongly recommend you do turn off publish_async as this will be the cause of using "insane memory". What is the reason you turned it on in the first place? Was events / s not high enough without it?
I would expect you to see some more details about the errors in the elasticsearch logs.
How many events per second do you have? Did you test if Filebeat is the limiting factor?
30k events per second per machine in one of our requirements, and I think filebeat is not the limiting factor, filebeat consume 1.5 core to process those events. By the way, filebeat 5.0.0 can consume about 20k events per second per core, while filebeat 5.2.2 can only do 10k events per second per core.
What kind of pipeline are you using in elasticsearch?
a grok processor to extract fields from log events.
If filebeat is not the limiting factor, you should not need publish_async so the memory issue should disappear. Can you disable publish_async and confirm that?
Grok processor: A grok processor can be quite expensive. It seems you hit a limit there. I would recommend to open a discuss topic in the elasticsearch forum (with a link to this thread) and the above error and check on what you can do to get rid of it
5.0 to 5.2.2: That would be regression. So exact same setup leads in 5.2.2 less events/s then in 5.0? Could do run this comparison with file output to get the output performance out of the equation?
If filebeat is not the limiting factor, you should not need publish_async so the memory issue should disappear. Can you disable publish_async and confirm that?
sorry, I was not clear enough. I means that filebeat does not run out of cpu resource when testing. In production environment, we would like filebeat cost less cpu resource. After I close publish_async, the memory issue disappear indeed.
Grok processor: A grok processor can be quite expensive. It seems you hit a limit there. I would recommend to open a discuss topic in the elasticsearch forum (with a link to this thread) and the above error and check on what you can do to get rid of it
Ok, I will do it.
5.0 to 5.2.2: That would be regression. So exact same setup leads in 5.2.2 less events/s then in 5.0? Could do run this comparison with file output to get the output performance out of the equation?
Yes, xact same setup leads, I just replace the filebeat bin file.
publish_async is deprecated. It has been experimental and did lead to too many problems. It's no regression, as bad behavior can occur with any release. You're highly encouraged to disable it. Instead enable pipelining: true in logstash output and have spooler_size > bulk_max_size.
Oh, I see. Well, the Elasticsearch output doesn't support pipelining. The best you can do is have multiple workers and have spooler_size > bulk_max_size. With N = spooler_size / bulk_max_size, the bigger N, the more sub-batches will be processed concurrently (at the cost of increased memory size).
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.