Reduce Disk Space Requirements

Details:
Elastic Search version used: 1.3.4
Docs to index: ~ 2.2 Million
Growth in docs: few 100 docs every week.
Number of fields per doc: ~10-15
tokenizers used: ngram (min:2, max:15), path_hierarchy
filters used: word_delimiter, pattern_capture, lowercase, unique
Size on disk: ~ 150 GB (No replicas are active)

Problem:
Unfortunately, I don't have the luxury of a lot of free disk space at my
disposal.
Why? [Let me just say I work for a too big-to-fail organizations, if you
know what I mean :-)]
I need to reduce my index storage footprint by at least 50%.

Solutions tried:

  1. run _flush & _optimize on the index. Didn't affect the size on disk.
  2. decrease the number of primary shards from 5 to 2 (realized this is a
    useless attempt as number of shards doesn't affect disk space)
  3. Looked into archiving the index after closing (can't use this solution
    as I want our users to search through all of the 2.2 Million docs, so can't
    archive partial docs)

Can you guys suggest any other options to reduce index disk size?
Your inputs are much appreciated.

Thanks,
Parth Gandhi

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e343c173-da25-4281-8909-cea62cfdf6f3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

ngram min=2 kills your index space. Use min=3 or higher. Also maybe edge
ngram tokenizer might be an alternative.

Jörg

On Sat, Oct 18, 2014 at 12:06 AM, PARTH GANDHI parth.gandhi85@gmail.com
wrote:

Details:
Elastic Search version used: 1.3.4
Docs to index: ~ 2.2 Million
Growth in docs: few 100 docs every week.
Number of fields per doc: ~10-15
tokenizers used: ngram (min:2, max:15), path_hierarchy
filters used: word_delimiter, pattern_capture, lowercase, unique
Size on disk: ~ 150 GB (No replicas are active)

Problem:
Unfortunately, I don't have the luxury of a lot of free disk space at my
disposal.
Why? [Let me just say I work for a too big-to-fail organizations, if you
know what I mean :-)]
I need to reduce my index storage footprint by at least 50%.

Solutions tried:

  1. run _flush & _optimize on the index. Didn't affect the size on disk.
  2. decrease the number of primary shards from 5 to 2 (realized this is a
    useless attempt as number of shards doesn't affect disk space)
  3. Looked into archiving the index after closing (can't use this solution
    as I want our users to search through all of the 2.2 Million docs, so can't
    archive partial docs)

Can you guys suggest any other options to reduce index disk size?
Your inputs are much appreciated.

Thanks,
Parth Gandhi

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/e343c173-da25-4281-8909-cea62cfdf6f3%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/e343c173-da25-4281-8909-cea62cfdf6f3%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGvF%2BnExCR-%3DCr5Z1zdMQdMvaNbNw3q44Gg2_sZTZgJQA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.