Elasticsearch storage usage

I am working on an application where I am using elasticsearch as the
primary database, I have been running some inserts on an index and seeing
3x database storage increases over the raw files. I am not sure if this is
normally of a full text search system, Is there anything i can do to make
some things less searchable or able to keep the datasize smaller.

--

Lucene indexes are a form of an append-only database: new data is inserted
into a new file, then after certain conditions are met the index is
compacted by merging several files together into yet another file while
skipping the removed entries. That means there are old versions of data
lying around until compaction. I think you can tune the ES to do the Lucene
compactions more often.
There's also index compression, cf.
https://groups.google.com/d/msg/elasticsearch/3hNu6GPd4pE/I0Kcm_inyHQJ
And of course the inverted indexes have some overhead of their own as they
need to store document ids and statistics for every term.

Just my two cents.

среда, 26 декабря 2012 г., 12:33:11 UTC+4 пользователь Wojons Tech написал:

I am working on an application where I am using elasticsearch as the
primary database, I have been running some inserts on an index and seeing
3x database storage increases over the raw files. I am not sure if this is
normally of a full text search system, Is there anything i can do to make
some things less searchable or able to keep the datasize smaller.

--

AFAIK, Lucene has one of the most compact index formats. Other full-text
search engines usually have bigger indexes. See, for example,
http://taschenorakel.de/mathias/2012/04/18/fulltext-search-benchmarks/

I am not sure if this is normally of a full text search system

--

Okay that makes sense i am seeing the documents size rati going down its
much closer now to using twice as much space and not more than that.

On Wednesday, December 26, 2012 4:41:56 AM UTC-8, Artem Grinblat wrote:

AFAIK, Lucene has one of the most compact index formats. Other full-text
search engines usually have bigger indexes. See, for example,
http://taschenorakel.de/mathias/2012/04/18/fulltext-search-benchmarks/

I am not sure if this is normally of a full text search system

--

Hi,

You also probably have _source enabled and maaay have individual fields
marked as stored and maybe you also have _all?

Otis

ELASTICSEARCH Performance Monitoring - http://sematext.com/spm/index.html
Search Analytics - http://sematext.com/search-analytics/index.html

On Wednesday, December 26, 2012 3:33:11 AM UTC-5, Wojons Tech wrote:

I am working on an application where I am using elasticsearch as the
primary database, I have been running some inserts on an index and seeing
3x database storage increases over the raw files. I am not sure if this is
normally of a full text search system, Is there anything i can do to make
some things less searchable or able to keep the datasize smaller.

--