Storage Ratios - I my syslog streams are expanding in elastic search to more than 10:1?

Dylan_Johnson · October 2, 2012, 6:11pm

I have a directory that has 10GB of data and i used Logstash to parse the
date to elasticsearch which works fine. I have 2 index and 0 replication
however after logstash has finished parsing the date to ES the ES /data
directory is 80GB ?

This is unworkable ? Whats the reason for this ?

Thanks

Dylan

--

otisg · October 3, 2012, 1:16am

Hello Dylan,

ES stored the original JSON in the special _source field. That right there
means your index will be at least 10GB in size. Additionally, it is
possible your fields are also stored and not just index - the store part
would be another 10GB. On top of that is the inverted index. But I'm not
sure how you get to 80 GB. Maybe you are using ngrams somewhere? Maybe
you can check with Skywalker plugin what's inside your index?

Otis

Search Analytics - Cloud Monitoring Tools & Services | Sematext
Performance Monitoring - Sematext Monitoring | Infrastructure Monitoring Service

On Tuesday, October 2, 2012 2:11:56 PM UTC-4, Dylan Johnson wrote:

I have a directory that has 10GB of data and i used Logstash to parse the
date to elasticsearch which works fine. I have 2 index and 0 replication
however after logstash has finished parsing the date to ES the ES /data
directory is 80GB ?

This is unworkable ? Whats the reason for this ?

Thanks

Dylan

--

Dylan_Johnson · October 3, 2012, 5:41pm

Seems like my replicas were making up the space.

So i start with a couple of GB of data and end up with about x3 , this seems expensive ?

Does anyone have any info on compressing the index's or using some sort of archiving setup ? In other words whats the best way to save space when using ES ?

Does anyone have any recommendations on configuring ES so that the DB size is similar to that of the original data that was parsed into it ?

My CIO wont take me seriously when i tell him for every TB of data we need 3 TB in ES

Thanks

D

On Oct 3, 2012, at 2:16 AM, Otis Gospodnetic wrote:

Hello Dylan,

ES stored the original JSON in the special _source field. That right there means your index will be at least 10GB in size. Additionally, it is possible your fields are also stored and not just index - the store part would be another 10GB. On top of that is the inverted index. But I'm not sure how you get to 80 GB. Maybe you are using ngrams somewhere? Maybe you can check with Skywalker plugin what's inside your index?

Otis

Search Analytics - Cloud Monitoring Tools & Services | Sematext
Performance Monitoring - Sematext Monitoring | Infrastructure Monitoring Service

On Tuesday, October 2, 2012 2:11:56 PM UTC-4, Dylan Johnson wrote:
I have a directory that has 10GB of data and i used Logstash to parse the date to elasticsearch which works fine. I have 2 index and 0 replication however after logstash has finished parsing the date to ES the ES /data directory is 80GB ?

This is unworkable ? Whats the reason for this ?

Thanks

Dylan

--

--

otisg · October 3, 2012, 8:17pm

Hello,

You could disable _source. You could compress it -

You could have 0 replicas (how many does the DB have?)

See also
Elasticsearch Platform — Find real-time answers at scale | Elastic .

Otis

Search Analytics - Cloud Monitoring Tools & Services | Sematext
Performance Monitoring - Sematext Monitoring | Infrastructure Monitoring Service

On Wednesday, October 3, 2012 1:41:41 PM UTC-4, Dylan Johnson wrote:

Seems like my replicas were making up the space.

So i start with a couple of GB of data and end up with about x3 , this
seems expensive ?

Does anyone have any info on compressing the index's or using some sort of
archiving setup ? In other words whats the best way to save space when
using ES ?

Does anyone have any recommendations on configuring ES so that the DB size
is similar to that of the original data that was parsed into it ?

My CIO wont take me seriously when i tell him for every TB of data we need
3 TB in ES

Thanks

D

On Oct 3, 2012, at 2:16 AM, Otis Gospodnetic wrote:

Hello Dylan,

ES stored the original JSON in the special _source field. That right
there means your index will be at least 10GB in size. Additionally, it is
possible your fields are also stored and not just index - the store part
would be another 10GB. On top of that is the inverted index. But I'm not
sure how you get to 80 GB. Maybe you are using ngrams somewhere? Maybe
you can check with Skywalker plugin what's inside your index?

Otis

Search Analytics - Cloud Monitoring Tools & Services | Sematext
Performance Monitoring - Sematext Monitoring | Infrastructure Monitoring Service

On Tuesday, October 2, 2012 2:11:56 PM UTC-4, Dylan Johnson wrote:

I have a directory that has 10GB of data and i used Logstash to parse the
date to elasticsearch which works fine. I have 2 index and 0 replication
however after logstash has finished parsing the date to ES the ES /data
directory is 80GB ?

This is unworkable ? Whats the reason for this ?

Thanks

Dylan

--

--

kimchy · October 4, 2012, 7:25am

If you use "vanilla" logstash, without changing anything, then you can look here for some hints (specifically, the new stored compression): GitHub - elastic/logstash: Logstash - transport and process your logs, events, or other data. @whack also gisted (but I can't find it) an experiment that he did with storage sizes with ES, can't find it now, you can possibly ping him.

On Oct 3, 2012, at 7:41 PM, Dylan Johnson dylandjohnson@googlemail.com wrote:

Seems like my replicas were making up the space.

So i start with a couple of GB of data and end up with about x3 , this seems expensive ?

Does anyone have any info on compressing the index's or using some sort of archiving setup ? In other words whats the best way to save space when using ES ?

Does anyone have any recommendations on configuring ES so that the DB size is similar to that of the original data that was parsed into it ?

My CIO wont take me seriously when i tell him for every TB of data we need 3 TB in ES

Thanks

D

On Oct 3, 2012, at 2:16 AM, Otis Gospodnetic wrote:

Hello Dylan,

ES stored the original JSON in the special _source field. That right there means your index will be at least 10GB in size. Additionally, it is possible your fields are also stored and not just index - the store part would be another 10GB. On top of that is the inverted index. But I'm not sure how you get to 80 GB. Maybe you are using ngrams somewhere? Maybe you can check with Skywalker plugin what's inside your index?

Otis

Search Analytics - Cloud Monitoring Tools & Services | Sematext
Performance Monitoring - Sematext Monitoring | Infrastructure Monitoring Service

On Tuesday, October 2, 2012 2:11:56 PM UTC-4, Dylan Johnson wrote:
I have a directory that has 10GB of data and i used Logstash to parse the date to elasticsearch which works fine. I have 2 index and 0 replication however after logstash has finished parsing the date to ES the ES /data directory is 80GB ?

This is unworkable ? Whats the reason for this ?

Thanks

Dylan

--

--

--

Topic		Replies	Views
ElasticSearch index size peculiarity Elasticsearch	2	686	July 6, 2017
Elasticsearch Data Directory size anomaly Elasticsearch	5	1417	August 23, 2018
Equivalence of log file versus space size stored in elastic search Elasticsearch	2	300	July 6, 2017
Lucene vs elasticsearch file size Elasticsearch	5	391	July 6, 2017
Reducing Disk Space Requirements/ Deduplication? Zipping? Elasticsearch	5	2322	July 6, 2017

Storage Ratios - I my syslog streams are expanding in elastic search to more than 10:1?

Otis

Otis

Otis

Otis

Otis

Related topics