I have a doubt in the elasticsearch indexing storage mechanism that is,
For example, When we are indexing the data, it occupies some space in disk storage for storing the DOC, when we apply any analyzer the space occupied varies and also it is growing in multiple times of the original data in the database.
Can anyone explain the ratio or concept behind it?
We do store the original document in _source, and then all the fields as per the analyser. So it really depends on what sort of analysis you are using.
Note that we do compress things, so that should help.
Thanks, Mark Walkom. (@warkolm)
I am using the following analyzer,
Shingle
Snowball
nGram
Raw.
Do you have any idea about the ratio of the original and index storage?
And also do I need to change any settings to apply the compressing that you have mentioned.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.