Logstash Memory leak?


(Sumit Arora) #1

Hello,

We're running logstash 1.5.3 on windows 8.1 server 2012 having 8GB RAM. Logstash configured to have max 1g heap size. We're running quite a big job which runs for 10-12 hours to complete and creates multiple rolling logs with each processing thread. Logstash is pointing to logs folder and configured to parse it with each new line.

Configuration settings for reference:

input {
file {
path => "c:/temp/logs/*.txt"
type => "somelogs"
}
}

filter
{
if [type] == "somelogs"
{
grok {
match => [ "message", "%{DATESTAMP:logdate}\s%{NUMBER}:00\s[%{LOGLEVEL:loglevel}]\s[%{HOSTNAME:loghost}]\s[%{JAVACLASS:class}]\s[Process id: %{NUMBER:processId}]\s[Thread id: %{NUMBER:threadId}]\s%{GREEDYDATA:eventMessage}" ]
overwrite => [ "message" ]
}
}
}

output {

elasticsearch {
host => "127.0.0.1"
port => "80"
index => "elasticlogs"
protocol => "http"
manage_template => false
}

}

Logstash settings in setup.bat: set LS_MIN_MEM=256m and set LS_MAX_MEM=1g

Observation: After running the job for 3-4 hours logstash memory consumption is ever growing. It started with minimum under 250 MB and now reached to 1 gb but never come down. After 8-10 hrs the memory consumption reaches to 4 GB as well in previous runs. CPU consumption is well under 5 %.

My questions are:

  1. If GC is running why the memory is not getting released? My understanding is logstash is not a storage system and just parses logs in a batches of 50, 100 etc at a time?

  2. Does it need more heap size? Is there a tool to monitor logstash JVM? I tried jVisualVM and JCosole and none of them are detecting logstash java process.

  3. How to determine the Heap size required for logstash?

Help in this issue is greatly appreciated.


LS 2.0 & Java Memory Leak
(Mark Walkom) #2

Why not turn on GC logging and see what is happening?
However it won't use more than 1GB if that is the max you have set, you're likely looking at OS level stats which take into account caching that it does.


(Sumit Arora) #3

Yes that's a good point which I also considered to turn GC logging ON after the current job finishes.

Memory against logstash (java.exe) in performance monitor shows 4 GB. How to distinguish if it's OS memory? Any tool you suggest?


(Magnus Bäck) #4

What kind of memory are you tracking with Performance Monitor? Private memory? The working set size? Virtual memory?


(Sumit Arora) #5

Currently Private and Working set Memory are sitting at 900 MB mark and shareable is just 4 MB.


(Sumit Arora) #6

Latest Update: After running the job for 10-12 hours under heavy load, logstash process as seen under windows task manager goes upto 3 GB. (may be cache memory is also included but not sure).

Enabled GC logs and below are some of the output from gc logs which has GC allocation failures which I think it's quite normal according to this thread. http://stackoverflow.com/questions/28342736/java-gc-allocation-failure

: 70540K->599K(78656K), 0.0053430 secs] 143804K->73991K(253440K), 0.0054162 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
87348.061: Total time for which application threads were stopped: 0.0056754 seconds, Stopping threads took: 0.0000366 seconds

After a while once the job is finished, logstash process did come to it's normal memory consumption to 300 MB from 3 GB during heavy load.

Could someone help me to understand why windows logstash process shows 3 GB? (Does this memory also has cached memory as pointed by Mark? If yes how much cache memory is needed by logstash if it's configured to have 1 GB HEAP size? This would give some understanding while planning the server capacity.


(Stephen Zhou) #7

Hi folks,
Similar issue happened at my side, Logstash JVM setting by default(-Xms256m -Xmx1g), memory leak issue occurred on each PROD servers. For instance, one of my Web Server, which got 32GB mem total, java.exe process occupied 25G memory in 2 months, issue mitigated by kill java process manually at last:

I was checking this problem today, tried to fix it by modifying LS_MAX_MEM, parameters as below, however, it seems not working, Windows memory still leaking.
I have no idea how to resolve it now, I guess the only option for me is create a task schedule restart Logstash service every day or week for now. :frowning:

C:\logstash\bin\setup.bat

if "%LS_MIN_MEM%" == "" (
set LS_MIN_MEM=400m
)
if "%LS_MAX_MEM%" == "" (
set LS_MAX_MEM=400m
)

Both Xms and Xmx setting as 400m, Working set should be within 400M*1024=409600K, but Working Set and Private Working Set greater than it as below:

GC logging as below:

Java HotSpot(TM) 64-Bit Server VM (25.45-b02) for windows-amd64 JRE (1.8.0_45-b15), built on Apr 30 2015 12:40:44 by "java_re" with MS VC++ 10.0 (VS2010)
Memory: 4k page, physical 33541208k(28359468k free), swap 37733676k(32562904k free)
CommandLine flags: -XX:CMSInitiatingOccupancyFraction=75 -XX:+CMSParallelRemarkEnabled -XX:+HeapDumpOnOutOfMemoryError -XX:InitialHeapSize=419430400 -XX:InitialTenuringThreshold=1 -XX:MaxHeapSize=419430400 -XX:MaxNewSize=139812864 -XX:MaxTenuringThreshold=1 -XX:NewSize=139812864 -XX:OldPLABSize=16 -XX:OldSize=279617536 -XX:+PrintClassHistogram -XX:+PrintGC -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -XX:SurvivorRatio=8 -XX:ThreadStackSize=2048 -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:-UseLargePagesIndividualAllocation -XX:+UseParNewGC 
0.705: Total time for which application threads were stopped: 0.0001910 seconds, Stopping threads took: 0.0000241 seconds
...
4.182: Total time for which application threads were stopped: 0.0001699 seconds, Stopping threads took: 0.0000342 seconds
4.264: [GC (CMS Initial Mark) [1 CMS-initial-mark: 0K(273088K)] 52455K(395968K), 0.0343662 secs] [Times: user=0.09 sys=0.00, real=0.03 secs] 
4.299: Total time for which application threads were stopped: 0.0351564 seconds, Stopping threads took: 0.0001477 seconds
4.299: [CMS-concurrent-mark-start]
4.300: [CMS-concurrent-mark: 0.001/0.001 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
4.300: [CMS-concurrent-preclean-start]
4.302: [CMS-concurrent-preclean: 0.002/0.002 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
4.302: [CMS-concurrent-abortable-preclean-start]
4.375: Total time for which application threads were stopped: 0.0005524 seconds, Stopping threads took: 0.0000842 seconds
...
4299.401: Total time for which application threads were stopped: 0.0194531 seconds, Stopping threads took: 0.0000322 seconds
4308.326: [GC (Allocation Failure) 4308.326: [ParNew
**strong text**Desired survivor size 6979584 bytes, new threshold 1 (max 1)
- age   1:     320960 bytes,     320960 total
: 109771K->469K(122880K), 0.0115409 secs] 191521K->82289K(395968K), 0.0116299 secs] [Times: user=0.02 sys=0.00, real=0.02 secs] 
4308.338: Total time for which application threads were stopped: 0.0120255 seconds, Stopping threads took: 0.0000568 seconds
4317.235: [GC (Allocation Failure) 4317.235: [ParNew
Desired survivor size 6979584 bytes, new threshold 1 (max 1)
- age   1:     376192 bytes,     376192 total
: 109717K->422K(122880K), 0.0114861 secs] 191537K->82310K(395968K), 0.0115837 secs] [Times: user=0.09 sys=0.00, real=0.02 secs] 

OS: Windows Server 2003 and Windows Server 2008 R2
Logstash Version: logstash-1.5.4
JDK: jre1.8.0_60


(Robert Toellner) #8

I'll be watching this thread quite closely as well as my configuration is very similar to Sumit's and the issue is the same. I'm having to restart the Logstash service daily to free up memory at which time the steady/consistent climb of memory use begins. We will be increasing memory on our server, but according to Stephen, doesn't appear that it will matter other than delay the recurrence of service restarts.

My setup is quite simple, single Windows ELK 8gb ( to be increased ) server watching log4net files shared between two app servers. I have done nothing to the default logstash setup though I will begin looking into setup.bat settings.

Windows Server 2008R2
JDK 1.7.0_80
Logstash 1.5.4

Config looks like:
input {
file {
path => ["//server1/d$/appname/Prod/logs/logname*.txt", "//server2/d$/appname/Prod/logs/logname*.txt"]
start_position => "beginning"
tags => ["appname", "logname", "Prod"]
type => "log4net"
codec => multiline {
pattern => "^%{TIMESTAMP_ISO8601} "
negate => true
what => previous
}
}
}

filter {
if [type] == "log4net" {
grok {
match => [ "message", "(?m)%{TIMESTAMP_ISO8601:sourceTimestamp} [%{NUMBER:threadId}] %{LOGLEVEL:level} %{DATA:logger} %{GREEDYDATA:tempMessage}" ]
tag_on_failure => ["error_message_not_parsed"]
break_on_match => false
}
grok {
match => [ "path", "//(?[^/]+)" ] # parse the servername from the path
tag_on_failure => ["path_not_parsed"]
}
mutate {
replace => [ "message" , "%{tempMessage}" ]
remove_field => [ "tempMessage" ]
}

}
}


(Sumit Arora) #9

Further to my last update, I also looked into the Logstash logs which was throwing “failed to open files” error as below

Below are the errors from Logstash logs

{:timestamp=>"2015-09-07T09:42:34.065000+1000", :message=>"failed to open c:/temp/logs/infinity-20150903_19624.txt: c:\temp\logs\infinity-20150903_19624.txt", :level=>:warn}
{:timestamp=>"2015-09-07T09:42:34.065000+1000", :message=>"failed to open c:/temp/logs/infinity-20150903_19625.txt: c:\temp\logs\infinity-20150903_19625.txt", :level=>:warn}
{:timestamp=>"2015-09-07T09:42:38.221000+1000", :message=>"failed to open c:/temp/logs/infinity-20150903_19626.txt: c:\temp\logs\infinity-20150903_19626.txt", :level=>:warn}

I think (may be wrong) since Logstash is trying to read the opened files as it’s written, the threads goes into wait state which results into too many open threads and memory increases significantly. After a while, the process threads never recovers/closed as a result the RAM consumption is too high.

Can you guys also look into the logs to see if you get "failed to open files" error? This might be windows specific file locking issue reported by Windows users only and not reported by Linux based environments.


(Stephen Zhou) #10

I am using logstash to collect Windows IIS Log, which(mtime+3) will be removed at 01:00AM every day by task schedule.
Similar with Sumit, logstash thrown "failed to open files" exception when them be removed, no idea why logstash threads still need to read/lock these old logs.

{:timestamp=>"2015-09-09T01:12:09.453000+0800", :message=>"failed to open D:\\iislog\\[HOST]\\W3SVC2/u_ex150906.log: D:\\iislog\\[HOST]\\W3SVC2\\u_ex150906.log", :level=>:warn}
{:timestamp=>"2015-09-10T01:15:19.339000+0800", :message=>"failed to open D:\\iislog\\[HOST]\\W3SVC2/u_ex150907.log: D:\\iislog\\[HOST]\\W3SVC2\\u_ex150907.log", :level=>:warn}
{:timestamp=>"2015-09-11T01:09:48.044000+0800", :message=>"failed to open D:\\iislog\\[HOST]\\W3SVC2/u_ex150908.log: D:\\iislog\\[HOST]\\W3SVC2\\u_ex150908.log", :level=>:warn}
{:timestamp=>"2015-09-12T01:11:17.170000+0800", :message=>"failed to open D:\\iislog\\[HOST]\\W3SVC2/u_ex150909.log: D:\\iislog\\[HOST]\\W3SVC2\\u_ex150909.log", :level=>:warn}
{:timestamp=>"2015-09-13T01:17:33.506000+0800", :message=>"failed to open D:\\iislog\\[HOST]\\W3SVC2/u_ex150910.log: D:\\iislog\\[HOST]\\W3SVC2\\u_ex150910.log", :level=>:warn}
{:timestamp=>"2015-09-14T01:09:08.504000+0800", :message=>"failed to open D:\\iislog\\[HOST]\\W3SVC2/u_ex150911.log: D:\\iislog\\[HOST]\\W3SVC2\\u_ex150911.log", :level=>:warn}
{:timestamp=>"2015-09-15T01:07:01.040000+0800", :message=>"failed to open D:\\iislog\\[HOST]\\W3SVC2/u_ex150912.log: D:\\iislog\\[HOST]\\W3SVC2\\u_ex150912.log", :level=>:warn}

(Pier-Hugues Pellerin) #11

When Logstash is taking too much memory can you take a heap dump and post is somewhere?
I would be really interested to see what is the memory usage in the JVM.

To get the process id on windows.

PS C:\> Get-WmiObject Win32_Process -Filter "CommandLine like '%elasticsearch%' OR CommandLine like '%logstash%'" | select ProcessId,CommandLine

To get a heap dump
jmap -dump:live,format=b,file=heap.hprof <PID>


(Skuky12 Sudy1) #12

hello , I'm the same problem,Java memory has been growing , take up the space is 100%.

windows server 2008 , logstash2.0.0 java version "1.8.0_65" Physical memory 8G

Do you have a solution ???


(Gaetano Parrello) #13

I 've the same problem
but....
i think I'm on the right path :smile:
my config file is like yours: in path=> I define a path putting in the end /logname*.txt

in this way logstash stay in tail in all the file at the same time (I 've seen it from the debug file) and the memory consuption goes on and on to infinity.
If I change the path configuration with /logname-20151117-*.txt, logstash stay in tail only on the today file and it keep using just 300-400mb of memory.. It doesn't go on and on to infinity and beyond :smile:

Is there a way to indicate a better range in the path configuration?


(Aaron Mildenstein) #14

Hello, fellow Logstashers. @jsvd spent days going down the rabbit hole to find this. He raised the alarm when he found that the memory leak in Windows is a JRuby bug, which you can read about here. We will release an updated version of Logstash with an updated JRuby as soon as one becomes available.


(Sumit Arora) #15

Appreciate the efforts!!. Looking forward for the fix.


(Gaetano Parrello) #16

@theuntergeek thanks a lot untergeek!
and thanks a lot to @jsvd for his effort.
I am on this trouble for 1 week and it seemed impossible to resolve it, even monitoring everything!

I'll wait for the next update before to pass in production step.

great job... and great community... See you soon!


(Skuky12 Sudy1) #17

elasticsearch2.0 w3c logformat have a problem ,

[mapper [@timestamp] is used by multiple types. Set update_all_types to true to update [format] across all types.]

filter {
#ignore log comments
if [message] =~ "^#" {
drop {}
}
grok {
# check that fields match your IIS log settings
match => ["message", "%{TIMESTAMP_ISO8601:log_timestamp} %{IPORHOST:site} %{WORD:method} %{URIPATH:page} %{NOTSPACE:querystring} %{NUMBER:port} %{NOTSPACE:username} %{IPORHOST:clienthost} %{NOTSPACE:useragent} %{NUMBER:response}"]
}
date {
match => [ "log_timestamp", "YYYY-MM-dd HH:mm:ss" ]
timezone => "Etc/UTC"
}
useragent {
source=> "useragent"
prefix=> "browser"
}


(Mark Walkom) #18

I don't think this relates to this topic does it? Please start your own thread instead.


(Diego Parra) #19

It seems that the 1.7.23 verion of jRuby fixes the memory leak ([https://github.com/jruby/jruby/issues/3446])

so I hope it should be matter of days to see the fix in logstash :smile:

I'll be looking forward for the fix


(Robert Toellner) #20

Cheers! :+1: