I am working on an application where I am using elasticsearch as the
primary database, I have been running some inserts on an index and seeing
3x database storage increases over the raw files. I am not sure if this is
normally of a full text search system, Is there anything i can do to make
some things less searchable or able to keep the datasize smaller.
Lucene indexes are a form of an append-only database: new data is inserted
into a new file, then after certain conditions are met the index is
compacted by merging several files together into yet another file while
skipping the removed entries. That means there are old versions of data
lying around until compaction. I think you can tune the ES to do the Lucene
compactions more often.
There's also index compression, cf. https://groups.google.com/d/msg/elasticsearch/3hNu6GPd4pE/I0Kcm_inyHQJ
And of course the inverted indexes have some overhead of their own as they
need to store document ids and statistics for every term.
Just my two cents.
среда, 26 декабря 2012 г., 12:33:11 UTC+4 пользователь Wojons Tech написал:
I am working on an application where I am using elasticsearch as the
primary database, I have been running some inserts on an index and seeing
3x database storage increases over the raw files. I am not sure if this is
normally of a full text search system, Is there anything i can do to make
some things less searchable or able to keep the datasize smaller.
On Wednesday, December 26, 2012 3:33:11 AM UTC-5, Wojons Tech wrote:
I am working on an application where I am using elasticsearch as the
primary database, I have been running some inserts on an index and seeing
3x database storage increases over the raw files. I am not sure if this is
normally of a full text search system, Is there anything i can do to make
some things less searchable or able to keep the datasize smaller.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.