After about 4-5 hours of running Logstash 5.2.1 on Windows I get this error:
java.lang.OutOfMemoryError: Java heap space
Dumping heap to C:\logstash-5.2.1/heapdump.hprof ...
Unable to create C:\logstash-5.2.1/heapdump.hprof: File exists
The signal INT is in use by the JVM and will not work correctly on this platform
19:03:58.270 [[main]>worker1] ERROR logstash.pipeline - Exception in pipelineworker, the pipeline stopped processing new events, please check your filter configuration and restart Logstash. {"exception"=>java.lang.OutOfMemoryError: Java heap space
Error: Your application used more memory than the safety cap of 1G.
Specify -J-Xmx####m to increase it (#### = cap size in MB)
I have already edited file C:\logstash-5.2.1\config\jvm.options and changed the line to
-Xmx3g
However, as you can see in the error above, it's still only using 1GB for heap. Why aren't the options in jvm.options taking effect? Do I need to edit logstash.conf somehow to point to jvm.options?
Thanks.
I did some looking and the issue is that jvm.options is not currently used in Windows. It's Linux/UNIX only. If you need changes to JVM options on Windows, look at editing the bin/setup.bat file. Or, alternatively, you could set the options as environment variables ahead of time. See https://github.com/elastic/logstash/blob/v5.2.1/bin/setup.bat for more information.
Particularly lines 27-29. You can get around this by setting LS_HEAP_SIZE as an environment variable, or change the default in this file. It's probably wiser to use the environment variable.
UPDATE: Logstash on Windows now also makes use of the jvm.options file, since 6.1
Those enormous, multi-line grok statements are probably the reason you're exhausting your heap space. Logstash has to stage events to test each of those possible answers in order, until a correct match can be found. For such a scenario, an even bigger heap than 3g is probably needed. However, there are better alternatives.
I noticed a great deal of repetition in those grok lines. In the NetScaler lines, they all seem to begin with (I didn't verify, but it does seem that way):
Why repeat that over and over in each line? That's simply causing Logstash to do much more work.
If they all start with that, you'd be better served splitting the grok work by starting with the dissect filter, and then using grok, or dissect again, on the remaining portion of each line that is different. This will be much more efficient, and less likely to consume enormous amounts of heap space. You can, of course, still use grok to accomplish this multiple split approach, but the dissect filter is considerably faster than grok for these use cases, and requires far less memory.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.