Storage Ratios - I my syslog streams are expanding in elastic search to more than 10:1?


(Dylan Johnson) #1

I have a directory that has 10GB of data and i used Logstash to parse the
date to elasticsearch which works fine. I have 2 index and 0 replication
however after logstash has finished parsing the date to ES the ES /data
directory is 80GB ?

This is unworkable ? Whats the reason for this ?

Thanks

Dylan

--


(Otis Gospodnetić) #2

Hello Dylan,

ES stored the original JSON in the special _source field. That right there
means your index will be at least 10GB in size. Additionally, it is
possible your fields are also stored and not just index - the store part
would be another 10GB. On top of that is the inverted index. But I'm not
sure how you get to 80 GB. Maybe you are using ngrams somewhere? Maybe
you can check with Skywalker plugin what's inside your index?

Otis

Search Analytics - http://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.html

On Tuesday, October 2, 2012 2:11:56 PM UTC-4, Dylan Johnson wrote:

I have a directory that has 10GB of data and i used Logstash to parse the
date to elasticsearch which works fine. I have 2 index and 0 replication
however after logstash has finished parsing the date to ES the ES /data
directory is 80GB ?

This is unworkable ? Whats the reason for this ?

Thanks

Dylan

--


(Dylan Johnson) #3

Seems like my replicas were making up the space.

So i start with a couple of GB of data and end up with about x3 , this seems expensive ?

Does anyone have any info on compressing the index's or using some sort of archiving setup ? In other words whats the best way to save space when using ES ?

Does anyone have any recommendations on configuring ES so that the DB size is similar to that of the original data that was parsed into it ?

My CIO wont take me seriously when i tell him for every TB of data we need 3 TB in ES

Thanks

D

On Oct 3, 2012, at 2:16 AM, Otis Gospodnetic wrote:

Hello Dylan,

ES stored the original JSON in the special _source field. That right there means your index will be at least 10GB in size. Additionally, it is possible your fields are also stored and not just index - the store part would be another 10GB. On top of that is the inverted index. But I'm not sure how you get to 80 GB. Maybe you are using ngrams somewhere? Maybe you can check with Skywalker plugin what's inside your index?

Otis

Search Analytics - http://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.html

On Tuesday, October 2, 2012 2:11:56 PM UTC-4, Dylan Johnson wrote:
I have a directory that has 10GB of data and i used Logstash to parse the date to elasticsearch which works fine. I have 2 index and 0 replication however after logstash has finished parsing the date to ES the ES /data directory is 80GB ?

This is unworkable ? Whats the reason for this ?

Thanks

Dylan

--

--


(Otis Gospodnetić) #4

Hello,

You could disable _source. You could compress it -
http://www.elasticsearch.org/guide/reference/mapping/source-field.html
You could have 0 replicas (how many does the DB have?)

See also
http://www.elasticsearch.org/guide/reference/index-modules/store.html .

Otis

Search Analytics - http://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.html

On Wednesday, October 3, 2012 1:41:41 PM UTC-4, Dylan Johnson wrote:

Seems like my replicas were making up the space.

So i start with a couple of GB of data and end up with about x3 , this
seems expensive ?

Does anyone have any info on compressing the index's or using some sort of
archiving setup ? In other words whats the best way to save space when
using ES ?

Does anyone have any recommendations on configuring ES so that the DB size
is similar to that of the original data that was parsed into it ?

My CIO wont take me seriously when i tell him for every TB of data we need
3 TB in ES

Thanks

D

On Oct 3, 2012, at 2:16 AM, Otis Gospodnetic wrote:

Hello Dylan,

ES stored the original JSON in the special _source field. That right
there means your index will be at least 10GB in size. Additionally, it is
possible your fields are also stored and not just index - the store part
would be another 10GB. On top of that is the inverted index. But I'm not
sure how you get to 80 GB. Maybe you are using ngrams somewhere? Maybe
you can check with Skywalker plugin what's inside your index?

Otis

Search Analytics - http://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.html

On Tuesday, October 2, 2012 2:11:56 PM UTC-4, Dylan Johnson wrote:

I have a directory that has 10GB of data and i used Logstash to parse the
date to elasticsearch which works fine. I have 2 index and 0 replication
however after logstash has finished parsing the date to ES the ES /data
directory is 80GB ?

This is unworkable ? Whats the reason for this ?

Thanks

Dylan

--

--


(Shay Banon) #5

If you use "vanilla" logstash, without changing anything, then you can look here for some hints (specifically, the new stored compression): https://github.com/logstash/logstash/wiki/Elasticsearch-Storage-Optimization. @whack also gisted (but I can't find it) an experiment that he did with storage sizes with ES, can't find it now, you can possibly ping him.

On Oct 3, 2012, at 7:41 PM, Dylan Johnson dylandjohnson@googlemail.com wrote:

Seems like my replicas were making up the space.

So i start with a couple of GB of data and end up with about x3 , this seems expensive ?

Does anyone have any info on compressing the index's or using some sort of archiving setup ? In other words whats the best way to save space when using ES ?

Does anyone have any recommendations on configuring ES so that the DB size is similar to that of the original data that was parsed into it ?

My CIO wont take me seriously when i tell him for every TB of data we need 3 TB in ES

Thanks

D

On Oct 3, 2012, at 2:16 AM, Otis Gospodnetic wrote:

Hello Dylan,

ES stored the original JSON in the special _source field. That right there means your index will be at least 10GB in size. Additionally, it is possible your fields are also stored and not just index - the store part would be another 10GB. On top of that is the inverted index. But I'm not sure how you get to 80 GB. Maybe you are using ngrams somewhere? Maybe you can check with Skywalker plugin what's inside your index?

Otis

Search Analytics - http://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.html

On Tuesday, October 2, 2012 2:11:56 PM UTC-4, Dylan Johnson wrote:
I have a directory that has 10GB of data and i used Logstash to parse the date to elasticsearch which works fine. I have 2 index and 0 replication however after logstash has finished parsing the date to ES the ES /data directory is 80GB ?

This is unworkable ? Whats the reason for this ?

Thanks

Dylan

--

--

--


(system) #6