Error: Your application used more memory than the safety cap of 4G


(José Rebelo) #1

Hello,

we have are dealing with this error since ever... (basically after installing and start feeding it)

We changed jmx and jms from initial 1Gb to 2Gb and then to 4Gb and we are consistently dealing with this error almost ever at the same clock time... If we restart ELK at 9-10 AM then we will have logstash crashing with this error around 11PM.

Error: Your application used more memory than the safety cap of 4G.
Specify -J-Xmx####M to increase it (#### = cap size in MB).
Specify -w for full java.lang.OutOfMemoryError: Java heap space stack trace

We are using Jenkins logstash plugin (1.2.0) to send data to logstash (6.0.0).

The logstash plugins versions we have are the following:

logstash-codec-cef (5.0.1-java)
logstash-codec-collectd (3.0.7)
logstash-codec-dots (3.0.5)
logstash-codec-edn (3.0.5)
logstash-codec-edn_lines (3.0.5)
logstash-codec-es_bulk (3.0.5)
logstash-codec-fluent (3.1.4-java)
logstash-codec-graphite (3.0.4)
logstash-codec-json (3.0.4)
logstash-codec-json_lines (3.0.4)
logstash-codec-line (3.0.4)
logstash-codec-msgpack (3.0.6-java)
logstash-codec-multiline (3.0.7)
logstash-codec-netflow (3.7.0)
logstash-codec-plain (3.0.4)
logstash-codec-rubydebug (3.0.4)
logstash-filter-aggregate (2.7.0)
logstash-filter-anonymize (3.0.5)
logstash-filter-cidr (3.1.1-java)
logstash-filter-clone (3.0.4)
logstash-filter-csv (3.0.6)
logstash-filter-date (3.1.8)
logstash-filter-de_dot (1.0.2)
logstash-filter-dissect (1.1.1)
logstash-filter-dns (3.0.6)
logstash-filter-drop (3.0.4)
logstash-filter-elasticsearch (3.2.0)
logstash-filter-fingerprint (3.1.1)
logstash-filter-geoip (5.0.1-java)
logstash-filter-grok (3.4.3)
logstash-filter-jdbc_streaming (1.0.2)
logstash-filter-json (3.0.4)
logstash-filter-kv (4.0.2)
logstash-filter-metrics (4.0.4)
logstash-filter-mutate (3.1.6)
logstash-filter-ruby (3.0.4)
logstash-filter-sleep (3.0.5)
logstash-filter-split (3.1.4)
logstash-filter-syslog_pri (3.0.4)
logstash-filter-throttle (4.0.3)
logstash-filter-translate (3.0.3)
logstash-filter-truncate (1.0.3)
logstash-filter-urldecode (3.0.5)
logstash-filter-useragent (3.2.1-java)
logstash-filter-xml (4.0.4)
logstash-input-beats (5.0.2-java)
logstash-input-dead_letter_queue (1.1.1)
logstash-input-elasticsearch (4.1.0)
logstash-input-exec (3.1.4)
logstash-input-file (4.0.3)
logstash-input-ganglia (3.1.2)
logstash-input-gelf (3.0.6)
logstash-input-generator (3.0.4)
logstash-input-graphite (3.0.4)
logstash-input-heartbeat (3.0.4)
logstash-input-http (3.0.6)
logstash-input-http_poller (4.0.3)
logstash-input-imap (3.0.4)
logstash-input-jdbc (4.3.0)
logstash-input-kafka (8.0.2)
logstash-input-pipe (3.0.5)
logstash-input-rabbitmq (6.0.1)
logstash-input-redis (3.1.5)
logstash-input-s3 (3.1.7)
logstash-input-snmptrap (3.0.4)
logstash-input-sqs (3.0.5)
logstash-input-stdin (3.2.4)
logstash-input-syslog (3.2.2)
logstash-input-tcp (5.0.2-java)
logstash-input-twitter (3.0.6)
logstash-input-udp (3.1.2)
logstash-input-unix (3.0.5)
logstash-mixin-aws (4.2.3)
logstash-mixin-http_client (6.0.1)
logstash-mixin-rabbitmq_connection (5.0.0-java)
logstash-output-cloudwatch (3.0.6)
logstash-output-csv (3.0.5)
logstash-output-elasticsearch (9.0.0-java)
logstash-output-email (4.0.6)
logstash-output-file (4.1.1)
logstash-output-graphite (3.1.3)
logstash-output-http (5.1.0)
logstash-output-kafka (7.0.4)
logstash-output-lumberjack (3.1.5)
logstash-output-nagios (3.0.4)
logstash-output-null (3.0.4)
logstash-output-pagerduty (3.0.5)
logstash-output-pipe (3.0.4)
logstash-output-rabbitmq (5.0.2-java)
logstash-output-redis (4.0.2)
logstash-output-s3 (4.0.12)
logstash-output-sns (4.0.5)
logstash-output-sqs (5.0.1)
logstash-output-stdout (3.1.2)
logstash-output-tcp (5.0.1)
logstash-output-udp (3.0.4)
logstash-output-webhdfs (3.0.4)
logstash-patterns-core (4.1.2)

We have seen many old reports about this issue but for older versions of logstash that supposedly the next newer version fixed the "bug" in some multiline plugin... But in our case we using the latest version of everything!!

We are using jdk 8 u 151

openjdk version "1.8.0_151"
OpenJDK Runtime Environment (build 1.8.0_151-8u151-b12-0ubuntu0.16.04.2-b12)
OpenJDK 64-Bit Server VM (build 25.151-b12, mixed mode)

In a docker image container which is OS version:

DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04.2 LTS"
NAME="Ubuntu"
VERSION="16.04.2 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.2 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial

Can you please give us some hint on how to solve this issue for good?

Thank you in advance.

Regards,
José Rebelo


(Magnus Bäck) #2

What does your pipeline configuration look like?


(José Rebelo) #3

Hi there,

we have the following configurations:

input {
udp {
codec => plain
port => 1514
type => doc
queue_size => 72000
receive_buffer_bytes => 31457280
buffer_size => 65536
}
}

output {
elasticsearch {
hosts => ["0.0.0.0:9200"]
index => "logstash-jenkins"
}
stdout { codec => rubydebug }
}


(Magnus Bäck) #4

Perhaps Logstash isn't processing the events fast enough so your buffers are filling up? What kind of inbound message rate do you have? Is ES keeping up?


(José Rebelo) #5

Hello,

I'm not sure I'm getting your point. We are using jenkins plugin for logstash which is using syslog indexer type. The protocol is UDP (RFC5424). The messages we are sending are sent from hundreds of jenkins jobs which have hundreds if not thousands of lines...

We don't have control over the logstash calls cadence because jobs are running in continuous integration mode.

Don't think this has to do with ES because we are experiencing the Java Heap Space issue still in LS... Isn't that so?

Meanwhile I've also noticed additional output log message that might be useful...

Does this have to do with the fact we have several filter files?? I'm just saying because I see the following in the previous message:

By the way, we have migrated everything to the version 6.1.2 (LS+ES+KIBANA) and the problem is the same...

Regards,
José Rebelo


(Magnus Bäck) #6

You've configured the udp input to have an internal queue of 72k messages. If that queue fills up because Logstash isn't able to process events (possibly because ES isn't able to accept messages sufficiently quickly) then Logstash is going to use some memory. However, even if the messages are 1k each the whole queue shouldn't use that much more than 72 MB. Sure, there's overhead but hardly enough to fill a 4 GB JVM heap. Perhaps there's a memory leak somewhere?


(José Rebelo) #7

Hi,

I think we have figured it out.

Basically we had an exception in "logstash.stdout" (during ELK startup) stating that the file "logstash-slowlog-plain.log" was not possible to be created. (lack of permissions for ELK user in the folder where that file was supposed to be created "log")

By fixing the permissions issue we have no longer that exception in log and ELK is running for already 2 days without interruption (which we never had untill now)

Regards,
José Rebelo


(system) #8

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.