I post several large text files, which are about 20~30MB and contains all
the text, into ES. And I use the attachment mapper to be the field type to
store these file.
It cost memory very much. Even when I post one file, the used memory grows
from about 150MB to 250MB. BTW, I use the default tokenizer for these field.
Although this file can be generated many tokens, but what I don't
understand is the memory cost. Does it store all the tokens into memory?
This time, I read the string content out from the file and push it into a
field inside a document, whose analyzer is standard.
Without the attachment mapper, the same condition occurred. It occur " java.lang.OutOfMemoryError:
Java heap spac" when the total index is just 400MB and the document count
is 10.
What's the suggestions for these large text file ?
I am considering using more smart analyzer which might eliminate some
redundancies. but are there any other ?
cheers,
Ivan
Ivan Ji於 2014年1月10日星期五UTC+8上午9時59分13秒寫道:
Hi all,
I post several large text files, which are about 20~30MB and contains all
the text, into ES. And I use the attachment mapper to be the field type to
store these file.
It cost memory very much. Even when I post one file, the used memory grows
from about 150MB to 250MB. BTW, I use the default tokenizer for these field.
Although this file can be generated many tokens, but what I don't
understand is the memory cost. Does it store all the tokens into memory?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.