Details:
Elastic Search version used: 1.3.4
Docs to index: ~ 2.2 Million
Growth in docs: few 100 docs every week.
Number of fields per doc: ~10-15
tokenizers used: ngram (min:2, max:15), path_hierarchy
filters used: word_delimiter, pattern_capture, lowercase, unique
Size on disk: ~ 150 GB (No replicas are active)
Problem:
Unfortunately, I don't have the luxury of a lot of free disk space at my
disposal.
Why? [Let me just say I work for a too big-to-fail organizations, if you
know what I mean :-)]
I need to reduce my index storage footprint by at least 50%.
Solutions tried:
run _flush & _optimize on the index. Didn't affect the size on disk.
decrease the number of primary shards from 5 to 2 (realized this is a
useless attempt as number of shards doesn't affect disk space)
Looked into archiving the index after closing (can't use this solution
as I want our users to search through all of the 2.2 Million docs, so can't
archive partial docs)
Can you guys suggest any other options to reduce index disk size?
Your inputs are much appreciated.
Details:
Elastic Search version used: 1.3.4
Docs to index: ~ 2.2 Million
Growth in docs: few 100 docs every week.
Number of fields per doc: ~10-15
tokenizers used: ngram (min:2, max:15), path_hierarchy
filters used: word_delimiter, pattern_capture, lowercase, unique
Size on disk: ~ 150 GB (No replicas are active)
Problem:
Unfortunately, I don't have the luxury of a lot of free disk space at my
disposal.
Why? [Let me just say I work for a too big-to-fail organizations, if you
know what I mean :-)]
I need to reduce my index storage footprint by at least 50%.
Solutions tried:
run _flush & _optimize on the index. Didn't affect the size on disk.
decrease the number of primary shards from 5 to 2 (realized this is a
useless attempt as number of shards doesn't affect disk space)
Looked into archiving the index after closing (can't use this solution
as I want our users to search through all of the 2.2 Million docs, so can't
archive partial docs)
Can you guys suggest any other options to reduce index disk size?
Your inputs are much appreciated.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.