Elasticsearch indexing storage mechanism

(Raj Kumar Natarajan) #1

Hello Buddies,

I have a doubt in the elasticsearch indexing storage mechanism that is,
For example, When we are indexing the data, it occupies some space in disk storage for storing the DOC, when we apply any analyzer the space occupied varies and also it is growing in multiple times of the original data in the database.

Can anyone explain the ratio or concept behind it?

Thanks in advance.

(Mark Walkom) #2

We do store the original document in _source, and then all the fields as per the analyser. So it really depends on what sort of analysis you are using.

Note that we do compress things, so that should help.

(Raj Kumar Natarajan) #3

Thanks, Mark Walkom. (@warkolm)
I am using the following analyzer,

  1. Shingle
  2. Snowball
  3. nGram
  4. Raw.

Do you have any idea about the ratio of the original and index storage?
And also do I need to change any settings to apply the compressing that you have mentioned.

(system) #4