OutOfMemory error using fsriver?

Hello All,

I'm parsing text file using fsriver. Recently i have a txt file of 1.2gb,
when fsriver tried to read the data, it throws following error in logs.

java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid3230.hprof ...
Heap dump file created [550996745 bytes in 16.871 secs]
Exception in thread "elasticsearch[NC ESserver][fs_slurper][T#1]"
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2271)
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
at
java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
at
fr.pilato.elasticsearch.river.fs.river.FsRiver$FSParser.indexFile(FsRiver.java:1226)
at
fr.pilato.elasticsearch.river.fs.river.FsRiver$FSParser.addFilesRecursively(FsRiver.java:720)
at
fr.pilato.elasticsearch.river.fs.river.FsRiver$FSParser.IndexFileSystem(FsRiver.java:649)
at
fr.pilato.elasticsearch.river.fs.river.FsRiver$FSParser.run(FsRiver.java:415)
at java.lang.Thread.run(Thread.java:724)

I have tried changing setting for JVM and ES but nothing works.
1: using commandline = bin/elasticsearch -f -Xmx2g -Xms2g
-Des.index.storage.type=memory

2: tried setting JVM using : *set JAVA_OPTS=-Xmx2g -Xms2g

*how to solve the following problem?

Thanks,
Ajit

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I think you need to allocate more memory to your JVM.
The river needs to load the file, create a base64 version, send it to es, which could be the same node the river is running on. I suspect that you need more than 3 times memory than your biggest file.

That said, I'm wondering if I could Optimize something in fsriver. Could you open an issue in fsriver project?

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 13 sept. 2013 à 03:00, Ajitpal Singh ajit.kamboj@gmail.com a écrit :

Hello All,

I'm parsing text file using fsriver. Recently i have a txt file of 1.2gb, when fsriver tried to read the data, it throws following error in logs.

java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid3230.hprof ...
Heap dump file created [550996745 bytes in 16.871 secs]
Exception in thread "elasticsearch[NC ESserver][fs_slurper][T#1]" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2271)
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
at fr.pilato.elasticsearch.river.fs.river.FsRiver$FSParser.indexFile(FsRiver.java:1226)
at fr.pilato.elasticsearch.river.fs.river.FsRiver$FSParser.addFilesRecursively(FsRiver.java:720)
at fr.pilato.elasticsearch.river.fs.river.FsRiver$FSParser.IndexFileSystem(FsRiver.java:649)
at fr.pilato.elasticsearch.river.fs.river.FsRiver$FSParser.run(FsRiver.java:415)
at java.lang.Thread.run(Thread.java:724)

I have tried changing setting for JVM and ES but nothing works.
1: using commandline = bin/elasticsearch -f -Xmx2g -Xms2g -Des.index.storage.type=memory

2: tried setting JVM using : set JAVA_OPTS=-Xmx2g -Xms2g

how to solve the following problem?

Thanks,
Ajit

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

How can we create a base64 version of file and parse using fsriver?

Can we have multi-part in indexing,[just an foolish thought :}] suppose we
have file sample.txt (2gb) and we assume
if index size > 20mb (assumption) , then send data to different shards
with following naming sample_1.txt have first 20 mb of index and so on...

I think, if we have some kind of parent mapping which include Multi-part
inside, could do the work. So when we query , it return data for multi-part
inside parent mapping or join the result of two different sub parts.

thanks
Ajit

On Friday, 13 September 2013 04:19:11 UTC+1, David Pilato wrote:

I think you need to allocate more memory to your JVM.
The river needs to load the file, create a base64 version, send it to es,
which could be the same node the river is running on. I suspect that you
need more than 3 times memory than your biggest file.

That said, I'm wondering if I could Optimize something in fsriver. Could
you open an issue in fsriver project?

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 13 sept. 2013 à 03:00, Ajitpal Singh <ajit....@gmail.com <javascript:>>
a écrit :

Hello All,

I'm parsing text file using fsriver. Recently i have a txt file of 1.2gb,
when fsriver tried to read the data, it throws following error in logs.

java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid3230.hprof ...
Heap dump file created [550996745 bytes in 16.871 secs]
Exception in thread "elasticsearch[NC ESserver][fs_slurper][T#1]"
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2271)
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
at
java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
at
fr.pilato.elasticsearch.river.fs.river.FsRiver$FSParser.indexFile(FsRiver.java:1226)
at
fr.pilato.elasticsearch.river.fs.river.FsRiver$FSParser.addFilesRecursively(FsRiver.java:720)
at
fr.pilato.elasticsearch.river.fs.river.FsRiver$FSParser.IndexFileSystem(FsRiver.java:649)
at
fr.pilato.elasticsearch.river.fs.river.FsRiver$FSParser.run(FsRiver.java:415)
at java.lang.Thread.run(Thread.java:724)

I have tried changing setting for JVM and ES but nothing works.
1: using commandline = bin/elasticsearch -f -Xmx2g -Xms2g
-Des.index.storage.type=memory

2: tried setting JVM using : *set JAVA_OPTS=-Xmx2g -Xms2g

*how to solve the following problem?

Thanks,
Ajit

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

You don't need FSRiver to index binary content. Have a look at this project: GitHub - elastic/elasticsearch-mapper-attachments: Mapper Attachments Type plugin for Elasticsearch
About how to encode in Base64, here is how I do it in FSRiver:
https://github.com/dadoonet/fsriver/blob/master/src/main/java/fr/pilato/elasticsearch/river/fs/river/FsRiver.java#L672

About splitting in multipart, fsriver and mapper attachment, I'm afraid that it's not supported.

You could I think reduce bulk_size to 1. It will help to hold in memory less documents and perhaps you will be able to index your big document.

HTH

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr | @scrutmydocs

Le 14 sept. 2013 à 09:08, Ajitpal Singh ajit.kamboj@gmail.com a écrit :

How can we create a base64 version of file and parse using fsriver?

Can we have multi-part in indexing,[just an foolish thought :}] suppose we have file sample.txt (2gb) and we assume
if index size > 20mb (assumption) , then send data to different shards with following naming sample_1.txt have first 20 mb of index and so on...

I think, if we have some kind of parent mapping which include Multi-part inside, could do the work. So when we query , it return data for multi-part inside parent mapping or join the result of two different sub parts.

thanks
Ajit

On Friday, 13 September 2013 04:19:11 UTC+1, David Pilato wrote:
I think you need to allocate more memory to your JVM.
The river needs to load the file, create a base64 version, send it to es, which could be the same node the river is running on. I suspect that you need more than 3 times memory than your biggest file.

That said, I'm wondering if I could Optimize something in fsriver. Could you open an issue in fsriver project?

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 13 sept. 2013 à 03:00, Ajitpal Singh ajit....@gmail.com a écrit :

Hello All,

I'm parsing text file using fsriver. Recently i have a txt file of 1.2gb, when fsriver tried to read the data, it throws following error in logs.

java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid3230.hprof ...
Heap dump file created [550996745 bytes in 16.871 secs]
Exception in thread "elasticsearch[NC ESserver][fs_slurper][T#1]" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2271)
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
at fr.pilato.elasticsearch.river.fs.river.FsRiver$FSParser.indexFile(FsRiver.java:1226)
at fr.pilato.elasticsearch.river.fs.river.FsRiver$FSParser.addFilesRecursively(FsRiver.java:720)
at fr.pilato.elasticsearch.river.fs.river.FsRiver$FSParser.IndexFileSystem(FsRiver.java:649)
at fr.pilato.elasticsearch.river.fs.river.FsRiver$FSParser.run(FsRiver.java:415)
at java.lang.Thread.run(Thread.java:724)

I have tried changing setting for JVM and ES but nothing works.
1: using commandline = bin/elasticsearch -f -Xmx2g -Xms2g -Des.index.storage.type=memory

2: tried setting JVM using : set JAVA_OPTS=-Xmx2g -Xms2g

how to solve the following problem?

Thanks,
Ajit

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.