You don't need FSRiver to index binary content. Have a look at this project: GitHub - elastic/elasticsearch-mapper-attachments: Mapper Attachments Type plugin for Elasticsearch
About how to encode in Base64, here is how I do it in FSRiver:
https://github.com/dadoonet/fsriver/blob/master/src/main/java/fr/pilato/elasticsearch/river/fs/river/FsRiver.java#L672
About splitting in multipart, fsriver and mapper attachment, I'm afraid that it's not supported.
You could I think reduce bulk_size to 1. It will help to hold in memory less documents and perhaps you will be able to index your big document.
HTH
--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr | @scrutmydocs
Le 14 sept. 2013 à 09:08, Ajitpal Singh ajit.kamboj@gmail.com a écrit :
How can we create a base64 version of file and parse using fsriver?
Can we have multi-part in indexing,[just an foolish thought :}] suppose we have file sample.txt (2gb) and we assume
if index size > 20mb (assumption) , then send data to different shards with following naming sample_1.txt have first 20 mb of index and so on...
I think, if we have some kind of parent mapping which include Multi-part inside, could do the work. So when we query , it return data for multi-part inside parent mapping or join the result of two different sub parts.
thanks
Ajit
On Friday, 13 September 2013 04:19:11 UTC+1, David Pilato wrote:
I think you need to allocate more memory to your JVM.
The river needs to load the file, create a base64 version, send it to es, which could be the same node the river is running on. I suspect that you need more than 3 times memory than your biggest file.
That said, I'm wondering if I could Optimize something in fsriver. Could you open an issue in fsriver project?
--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
Le 13 sept. 2013 à 03:00, Ajitpal Singh ajit....@gmail.com a écrit :
Hello All,
I'm parsing text file using fsriver. Recently i have a txt file of 1.2gb, when fsriver tried to read the data, it throws following error in logs.
java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid3230.hprof ...
Heap dump file created [550996745 bytes in 16.871 secs]
Exception in thread "elasticsearch[NC ESserver][fs_slurper][T#1]" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2271)
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
at fr.pilato.elasticsearch.river.fs.river.FsRiver$FSParser.indexFile(FsRiver.java:1226)
at fr.pilato.elasticsearch.river.fs.river.FsRiver$FSParser.addFilesRecursively(FsRiver.java:720)
at fr.pilato.elasticsearch.river.fs.river.FsRiver$FSParser.IndexFileSystem(FsRiver.java:649)
at fr.pilato.elasticsearch.river.fs.river.FsRiver$FSParser.run(FsRiver.java:415)
at java.lang.Thread.run(Thread.java:724)
I have tried changing setting for JVM and ES but nothing works.
1: using commandline = bin/elasticsearch -f -Xmx2g -Xms2g -Des.index.storage.type=memory
2: tried setting JVM using : set JAVA_OPTS=-Xmx2g -Xms2g
how to solve the following problem?
Thanks,
Ajit
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.