Elevated Memory Utilization and Errors in Filebeat When Integrating External MISP CTI Log Source

Summary: Our Elastic setup incorporates two MISP CTI log sources— one internal and one external—both managed within the Henkel environment. Upon integrating these MISP instances using the filebeat API method, we have identified a notable increase in memory utilization exclusively on the external MISP log source. This surge occurs when the external MISP sends logs to logstash shippers, prompting suspicions of a potential bug in the filebeat component of the external MISP.

Outlined below are the steps we undertook to address and investigate the issue:

Currently, the integration of these two MISP log sources is achieved through filebeat using the API method.
A dedicated filebeat server has been established, creating distinct services for the INTERNAL MISP and EXTERNAL MISP.
The filebeat service has been enabled, initiating the collection of logs from both MISP instances, which are then ingested into logstash shippers.
Post-ingestion, logs from both MISP instances are successfully being ingested into Elastic SIEM. However, only the external MISP filebeat server is experiencing a high memory utilization of approximately 8GB, coupled with errors observed in logstash shippers.
In an effort to isolate the issue and confirm its association with the external MISP, we conducted the following test:

a. The internal MISP server was installed on Server A and configured to send logs to logstash shippers LSS1 and LSS2.
b. Simultaneously, the external MISP server was installed on Server B, configured to send logs to the same logstash shippers (LSS1 and LSS2).
c. Despite these adjustments, the memory utilization on the external MISP server remained high, and errors persisted in logstash shippers.

Filebeat version is 8.11

Github issue link - Elevated Memory Utilization and Errors in Filebeat When Integrating External MISP CTI Log Source · Issue #38053 · elastic/beats · GitHub





Hello,

Please do not share logs as screenshots, it makes them hard to read and impossible to search for keywords.

Can you share those logs as plaintext using the preformatted text option? The </> button.

Is Filebeat and Logstash running on the same server? It is not clear.

Also, there is a possible memory leak on the httpjson input of Filebeat, this input is used in to make requests to API, the issue is here.

It looks that it was fixed 2 weeks ago and the fix was backported to 8.12.

Can you update your Filebeat to 8.12.2 and see if the issue persists?

So, if Filebeat and Logstash are running on the same machine, and Filebeat may have a memory leak problem, then it may cause the OOM issue you have in your Logstash logs.

Thanks for your response.

  1. Filebeat and logstash are on different server.
  2. Since both are running on different server shall I still upgrade to 8.12.2?

Apologies for uploading logs as screenshot as this is my first time posting in Elastic forum. Unfortunately the screenshots were taken during our testing phase we no longer have the facility to recreate and reshare same logs because we are undergoing some internal activity, but going forward I shall share as plaintext.

Thank you :slight_smile:

If they are in different servers, then you may have 2 different issues, your logstash logs clearly shows an OOM issue, so you need to investigate what may be causing this OOM issue in Logstash which is unrelated to Filebeat.

How many memory are you using for logstash on jvm.options ?

As mentioned there is a possible memory leak in the httpjson of filebeat, but if your filebeat is not crashing with OOM issues, then you may not have any issue in filebeat, but on the logstash side.

Hi,

I'm a colleague of Sunith and please let me chip in. We already tried a bunch of different scenarios, which make us rather certain we are talking about a bug in filebeat. :slight_smile:

First for the jvm.options, here we have

-Xms2g
-Xmx5g

on machines with 12GB physical memory. So, to my understanding this should yield up to 5GB heap memory, up to 5GB memory for direct memory mapping and leaving another 2GB breathing room for the VM running logstash. The OOM errors in the screenshot are related to the direct memory only and only occur in one particular scenario within our architecture.

To a bit better describe this particular scenario, first a bit of background to our architecture. We have:

  • Two separate instances of MISP (external and internal), one with having a few open source TI feeds integrated (external) and one with some closed source TI (internal). Both MISP are hosted the same way using the same versions of MISP.
  • Two instances of Logstash, receiving all kinds of logs and their sole task is to pass the received data on to a Kafka queue with the rest of our setup behind this. Think really simplistic configs here like one filebeat/agent input and a kafka output per pipeline. Both instances of logstash are configure the exact same way. Same machine specs, configs, versions, etc.
  • In between we usually have one Elastic Agent / Filebeat machine running to pick up certain logsource e.g. both our MISP instance.

This is basically what @sunith described in the diagram labeled "Scenario 1". There we had two instances of filebeat running on the same VM, each picking up one of the MISP instances and sending data to both our logstash instances for balancing/fail-over purposes. Every time we started to ingest the MISP external instance, the memory utilization on the corresponding filebeat instance skyrocketed and in parallel the Logstash machines started to throw out of direct mem errors while dumping the encoded MISP event content into the logstash logs (see screenshot CTI logs). The MISP internal instance+filebeat working exactly the same way in our arch, showed no such issues whatsoever.

To narrow down the source of the issue, we tried several different configurations (with/without load-balancing to the the logstash machines, having both filebeats on the same VM or on multiple and so on).

We ended up having a new filebeat instance on a freshly staged VM talking to our MISP external on one end and one of the logstash instances on the other. Once we start filebeat there and begin reading data from MISP external, the filebeat memory goes up unreasonably (several GB instead of a few hundred MB) and the logstash throws the out of direct mem errors and MISP data content. Doing the same with exactly the same setup but MISP internal everything stays sane and happy.

Our working assumption is based on the fact, that we see the MISP content being dumped as error output log in logstash with having a lot of encoded JSON. We think something in the content of our external MISP might make the MISP event parsing in filebeat fail and handing over to broad and "unseparated" data to logstash, which then can not process the too big chunks.

Hope that helps a bit to clarify where we are coming from. :slight_smile:

Regardless, we are trying first with an update to the latest 8.12 version. Maybe the httpjson bug is related to it and already implicitly fixed it. In case this is not the solution, any advise for us how we can do some more low level debugging?

Best regards,
Sebastian

@leandrojmp We have upgraded our filebeat server from 8.11 to 8.12.2 and we are still observing the same high memory utilization issue on the filebeat server. Please find below the output

root@filebeat_server:/# filebeat version
filebeat version 8.12.2 (amd64), libbeat 8.12.2 [0b71acf2d6b4cb6617bff980ed6caf0477905efa built 2024-02-15 13:39:15 +0000 UTC]
root@filebeat_server:/# systemctl status filebeat
● filebeat.service - Filebeat sends log files to Logstash or directly to Elasticsearch.
     Loaded: loaded (/lib/systemd/system/filebeat.service; enabled; vendor preset: enabled)
     Active: active (running) since Thu 2024-03-07 09:47:17 UTC; 39min ago
       Docs: https://www.elastic.co/beats/filebeat
   Main PID: 601 (filebeat)
      Tasks: 12 (limit: 9444)
     Memory: 143.9M
        CPU: 1.014s
     CGroup: /system.slice/filebeat.service
             └─601 /usr/share/filebeat/bin/filebeat --environment systemd -c /etc/filebeat/filebeat.yml --path.home /usr/share/filebeat --path.config /etc/filebeat --path.data /var/lib/filebeat --path.logs>

Mar 07 10:22:21 filebeat_server filebeat[601]: {"log.level":"info","@timestamp":"2024-03-07T10:22:21.733Z","log.logger":"monitoring","log.origin":{"function":"github.com/elastic/beats/v7/libbeat/monitoring/r>
Mar 07 10:22:51 filebeat_server filebeat[601]: {"log.level":"info","@timestamp":"2024-03-07T10:22:51.733Z","log.logger":"monitoring","log.origin":{"function":"github.com/elastic/beats/v7/libbeat/monitoring/r>
Mar 07 10:23:21 filebeat_server filebeat[601]: {"log.level":"info","@timestamp":"2024-03-07T10:23:21.736Z","log.logger":"monitoring","log.origin":{"function":"github.com/elastic/beats/v7/libbeat/monitoring/r>
Mar 07 10:23:51 filebeat_server filebeat[601]: {"log.level":"info","@timestamp":"2024-03-07T10:23:51.733Z","log.logger":"monitoring","log.origin":{"function":"github.com/elastic/beats/v7/libbeat/monitoring/r>
Mar 07 10:24:21 filebeat_server filebeat[601]: {"log.level":"info","@timestamp":"2024-03-07T10:24:21.733Z","log.logger":"monitoring","log.origin":{"function":"github.com/elastic/beats/v7/libbeat/monitoring/r>
Mar 07 10:24:51 filebeat_server filebeat[601]: {"log.level":"info","@timestamp":"2024-03-07T10:24:51.734Z","log.logger":"monitoring","log.origin":{"function":"github.com/elastic/beats/v7/libbeat/monitoring/r>
Mar 07 10:25:21 filebeat_server filebeat[601]: {"log.level":"info","@timestamp":"2024-03-07T10:25:21.733Z","log.logger":"monitoring","log.origin":{"function":"github.com/elastic/beats/v7/libbeat/monitoring/r>
Mar 07 10:25:51 filebeat_server filebeat[601]: {"log.level":"info","@timestamp":"2024-03-07T10:25:51.735Z","log.logger":"monitoring","log.origin":{"function":"github.com/elastic/beats/v7/libbeat/monitoring/r>
Mar 07 10:26:21 filebeat_server filebeat[601]: {"log.level":"info","@timestamp":"2024-03-07T10:26:21.733Z","log.logger":"monitoring","log.origin":{"function":"github.com/elastic/beats/v7/libbeat/monitoring/r>
Mar 07 10:26:51 filebeat_server filebeat[601]: {"log.level":"info","@timestamp":"2024-03-07T10:26:51.736Z","log.logger":"monitoring","log.origin":{"function":"github.com/elastic/beats/v7/libbeat/monitoring/r>

root@filebeat_server:/# systemctl status filebeat-misp-external
● filebeat-misp-external.service - Filebeat sends log files to Logstash or directly to Elasticsearch.
     Loaded: loaded (/etc/systemd/system/filebeat-misp-external.service; disabled; vendor preset: enabled)
     Active: active (running) since Thu 2024-03-07 09:49:50 UTC; 37min ago
       Docs: https://www.elastic.co/products/beats/filebeat
   Main PID: 1001 (filebeat)
      Tasks: 12 (limit: 9444)
     Memory: 4.1G
        CPU: 2min 59.089s
     CGroup: /system.slice/filebeat-misp-external.service
             └─1001 /usr/share/filebeat/bin/filebeat --environment systemd -c /etc/filebeat-misp-external/filebeat.yml --path.home /usr/share/filebeat --path.config /etc/filebeat-misp-external --path.data >

Mar 07 10:22:20 filebeat_server filebeat[1001]: {"log.level":"info","@timestamp":"2024-03-07T10:22:20.951Z","log.logger":"monitoring","log.origin":{"function":"github.com/elastic/beats/v7/libbeat/monitoring/>
Mar 07 10:22:50 filebeat_server filebeat[1001]: {"log.level":"info","@timestamp":"2024-03-07T10:22:50.951Z","log.logger":"monitoring","log.origin":{"function":"github.com/elastic/beats/v7/libbeat/monitoring/>
Mar 07 10:23:20 filebeat_server filebeat[1001]: {"log.level":"info","@timestamp":"2024-03-07T10:23:20.950Z","log.logger":"monitoring","log.origin":{"function":"github.com/elastic/beats/v7/libbeat/monitoring/>
Mar 07 10:23:50 filebeat_server filebeat[1001]: {"log.level":"info","@timestamp":"2024-03-07T10:23:50.948Z","log.logger":"monitoring","log.origin":{"function":"github.com/elastic/beats/v7/libbeat/monitoring/>
Mar 07 10:24:20 filebeat_server filebeat[1001]: {"log.level":"info","@timestamp":"2024-03-07T10:24:20.950Z","log.logger":"monitoring","log.origin":{"function":"github.com/elastic/beats/v7/libbeat/monitoring/>
Mar 07 10:24:50 filebeat_server filebeat[1001]: {"log.level":"info","@timestamp":"2024-03-07T10:24:50.948Z","log.logger":"monitoring","log.origin":{"function":"github.com/elastic/beats/v7/libbeat/monitoring/>
Mar 07 10:25:20 filebeat_server filebeat[1001]: {"log.level":"info","@timestamp":"2024-03-07T10:25:20.951Z","log.logger":"monitoring","log.origin":{"function":"github.com/elastic/beats/v7/libbeat/monitoring/>
Mar 07 10:25:50 filebeat_server filebeat[1001]: {"log.level":"info","@timestamp":"2024-03-07T10:25:50.948Z","log.logger":"monitoring","log.origin":{"function":"github.com/elastic/beats/v7/libbeat/monitoring/>
Mar 07 10:26:20 filebeat_server filebeat[1001]: {"log.level":"info","@timestamp":"2024-03-07T10:26:20.950Z","log.logger":"monitoring","log.origin":{"function":"github.com/elastic/beats/v7/libbeat/monitoring/>
Mar 07 10:26:50 filebeat_server filebeat[1001]: {"log.level":"info","@timestamp":"2024-03-07T10:26:50.950Z","log.logger":"monitoring","log.origin":{"function":"github.com/elastic/beats/v7/libbeat/monitoring/>

root@filebeat_server:/#

I wonder If your problem is similar to mine Addressing Filebeat's memory leak and performance issues with high log volume - #3 by Adriann

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.