Reduce Disk Space Requirements

PARTH_GANDHI · October 17, 2014, 10:06pm

Details:
Elastic Search version used: 1.3.4
Docs to index: ~ 2.2 Million
Growth in docs: few 100 docs every week.
Number of fields per doc: ~10-15
tokenizers used: ngram (min:2, max:15), path_hierarchy
filters used: word_delimiter, pattern_capture, lowercase, unique
Size on disk: ~ 150 GB (No replicas are active)

Problem:
Unfortunately, I don't have the luxury of a lot of free disk space at my
disposal.
Why? [Let me just say I work for a too big-to-fail organizations, if you
know what I mean :-)]
I need to reduce my index storage footprint by at least 50%.

Solutions tried:

run _flush & _optimize on the index. Didn't affect the size on disk.
decrease the number of primary shards from 5 to 2 (realized this is a
useless attempt as number of shards doesn't affect disk space)
Looked into archiving the index after closing (can't use this solution
as I want our users to search through all of the 2.2 Million docs, so can't
archive partial docs)

Can you guys suggest any other options to reduce index disk size?
Your inputs are much appreciated.

Thanks,
Parth Gandhi

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e343c173-da25-4281-8909-cea62cfdf6f3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jprante · October 17, 2014, 10:17pm

ngram min=2 kills your index space. Use min=3 or higher. Also maybe edge
ngram tokenizer might be an alternative.

Jörg

On Sat, Oct 18, 2014 at 12:06 AM, PARTH GANDHI parth.gandhi85@gmail.com
wrote:

Details:
Elastic Search version used: 1.3.4
Docs to index: ~ 2.2 Million
Growth in docs: few 100 docs every week.
Number of fields per doc: ~10-15
tokenizers used: ngram (min:2, max:15), path_hierarchy
filters used: word_delimiter, pattern_capture, lowercase, unique
Size on disk: ~ 150 GB (No replicas are active)

Problem:
Unfortunately, I don't have the luxury of a lot of free disk space at my
disposal.
Why? [Let me just say I work for a too big-to-fail organizations, if you
know what I mean :-)]
I need to reduce my index storage footprint by at least 50%.

Solutions tried:

run _flush & _optimize on the index. Didn't affect the size on disk.

decrease the number of primary shards from 5 to 2 (realized this is a
useless attempt as number of shards doesn't affect disk space)

Looked into archiving the index after closing (can't use this solution
as I want our users to search through all of the 2.2 Million docs, so can't
archive partial docs)

Can you guys suggest any other options to reduce index disk size?
Your inputs are much appreciated.

Thanks,
Parth Gandhi

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/e343c173-da25-4281-8909-cea62cfdf6f3%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/e343c173-da25-4281-8909-cea62cfdf6f3%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGvF%2BnExCR-%3DCr5Z1zdMQdMvaNbNw3q44Gg2_sZTZgJQA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Compress Elasticsearch Index/Disk Usage Elasticsearch Elasticsearch	3	3607	July 6, 2017
ELK cluster disk space usage optimization Elasticsearch	9	2534	July 5, 2017
Reducing Disk Space Requirements/ Deduplication? Zipping? Elasticsearch	5	2322	July 6, 2017
How to optimize disk usage? Elasticsearch	5	1201	July 6, 2017
Indices size Elasticsearch	4	606	July 6, 2017

Reduce Disk Space Requirements

Related topics