Running out of memory when parsing the large text file

Ivan_Ji · January 10, 2014, 1:59am

Hi all,

I post several large text files, which are about 20~30MB and contains all
the text, into ES. And I use the attachment mapper to be the field type to
store these file.
It cost memory very much. Even when I post one file, the used memory grows
from about 150MB to 250MB. BTW, I use the default tokenizer for these field.

Although this file can be generated many tokens, but what I don't
understand is the memory cost. Does it store all the tokens into memory?

Ideas?

Cheers,

Ivan

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2f200f67-7024-4cdd-9c68-05875f0155ca%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Ivan_Ji · March 10, 2014, 7:52am

Hi all,

This time, I read the string content out from the file and push it into a
field inside a document, whose analyzer is standard.
Without the attachment mapper, the same condition occurred. It occur " java.lang.OutOfMemoryError:
Java heap spac" when the total index is just 400MB and the document count
is 10.

What's the suggestions for these large text file ?

I am considering using more smart analyzer which might eliminate some
redundancies. but are there any other ?

cheers,

Ivan

Ivan Ji於 2014年1月10日星期五UTC+8上午9時59分13秒寫道：

Hi all,

I post several large text files, which are about 20~30MB and contains all
the text, into ES. And I use the attachment mapper to be the field type to
store these file.
It cost memory very much. Even when I post one file, the used memory grows
from about 150MB to 250MB. BTW, I use the default tokenizer for these field.

Although this file can be generated many tokens, but what I don't
understand is the memory cost. Does it store all the tokens into memory?

Ideas?

Cheers,

Ivan

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/75fa9adc-03c8-45a4-9360-c0926a7d0f1d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.