Logstash going OOM while dumping data from ES to CSV


(ankur) #1

Hello all,

I have 60 timestamp wise indexes in my cluster (total containing 200 million documents (150 GB) )
I am trying to export some data (around 15 million) from these indexes to a csv file using logstash.
Logstash is going OOM when i try to dump all indexes at the same time, although it is working fine with one index at a time.

bin/logstash.bat agent -f myconfig.config

io/console not supported; tty will not be manipulated
Logstash startup completed
java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid1900.hprof ...
Heap dump file created [359502197 bytes in 3.316 secs]
Exception in thread "<elasticsearch" java.lang.UnsupportedOperationException
at java.lang.Thread.stop(Thread.java:869)
at org.jruby.RubyThread.exceptionRaised(RubyThread.java:1221)
at org.jruby.internal.runtime.RubyRunnable.run(RubyRunnable.java:112)
at java.lang.Thread.run(Thread.java:745)
Logstash shutdown completed

config :

input {
elasticsearch {
}
}

output {
csv {
}
}

Logstash 1.5
Elasticsearch 1.6
Total memory of cluster : 22 GB

What do i have to do to resolve this?
OR any other way to export ES data to csv/text file?

Thanks.
Ankur


(Mark Walkom) #2

Based on that config you are taking everything from ES and exporting it.

You may want to either increase the heap you're giving to LS, or reduce the default size.


(ankur) #3

Thanks for your reply.
Here is my actual config :

input {
elasticsearch {
hosts => "10.10.8.14"
query => '{ "query": { "match": { "FileType": "TS_FILE" } } }'
index => "logstash-srsdb-*"
size => 10
scroll => "30s"

}
}

output {
#stdout {}
csv {
fields => ["Col1", "Col5", "Col7"]
path => "D:\elastic_dump\dump.csv"
csv_options => {"col_sep" => "," "row_sep" => "\r\n"}
}
}

I have reduced the size from default to 10. Now it is NOT going OOM but taking long time to dump.

How to increase heap size of Logstash? I tried to set LS_HEAP_SIZE in env variable but no luck.

One more thing using this line "csv_options => {"col_sep" => "," "row_sep" => "\r\n"}", row_sep actually printing "\r\n" in text instead of new line.

Thanks.
Ankur


(system) #4