Index Size difference while using keyword and standard analyzer


(Arulkumar) #1

Hi all,
I am indexing a file with 20MB of memory size, While using
standard analyzer it took memory of 87mb but for keyword analyzer it
took memory of 84mb. Why it takes this much memory....? Which is the
best analyzer while considering memory size.....?


(ppearcy) #2

The analyzer chosen really should be driven from search requirements.

But to give a (non) answer to your question, it all depends on your
data and you should test different ones out and see what works best
for you.

For example, I think if you used a stemmer, as well, that should
decrease size, as the overall term count would be reduced.

Regards,
Paul

On Dec 27, 2:12 am, Arulkumar arulkuma...@gmail.com wrote:

Hi all,
I am indexing a file with 20MB of memory size, While using
standard analyzer it took memory of 87mb but for keyword analyzer it
took memory of 84mb. Why it takes this much memory....? Which is the
best analyzer while considering memory size.....?


(ppearcy) #3

I should have also mentioned that the following index field mappings
parameters will increase/decrease the index size. Whether or not you
need them should be determined by your search requirements.

  • term_vector
  • omit_norms
  • omit_term_freq_and_positions
  • store

from this URL: http://www.elasticsearch.com/docs/elasticsearch/mapping/core_types/

And don't forget about _source compression:
http://www.elasticsearch.com/docs/elasticsearch/mapping/source_field/

On Dec 28, 10:20 am, Paul ppea...@gmail.com wrote:

The analyzer chosen really should be driven from search requirements.

But to give a (non) answer to your question, it all depends on your
data and you should test different ones out and see what works best
for you.

For example, I think if you used a stemmer, as well, that should
decrease size, as the overall term count would be reduced.

Regards,
Paul

On Dec 27, 2:12 am, Arulkumar arulkuma...@gmail.com wrote:

Hi all,
I am indexing a file with 20MB of memory size, While using
standard analyzer it took memory of 87mb but for keyword analyzer it
took memory of 84mb. Why it takes this much memory....? Which is the
best analyzer while considering memory size.....?


(system) #4