I have a directory that has 10GB of data and i used Logstash to parse the
date to elasticsearch which works fine. I have 2 index and 0 replication
however after logstash has finished parsing the date to ES the ES /data
directory is 80GB ?
ES stored the original JSON in the special _source field. That right there
means your index will be at least 10GB in size. Additionally, it is
possible your fields are also stored and not just index - the store part
would be another 10GB. On top of that is the inverted index. But I'm not
sure how you get to 80 GB. Maybe you are using ngrams somewhere? Maybe
you can check with Skywalker plugin what's inside your index?
On Tuesday, October 2, 2012 2:11:56 PM UTC-4, Dylan Johnson wrote:
I have a directory that has 10GB of data and i used Logstash to parse the
date to elasticsearch which works fine. I have 2 index and 0 replication
however after logstash has finished parsing the date to ES the ES /data
directory is 80GB ?
So i start with a couple of GB of data and end up with about x3 , this seems expensive ?
Does anyone have any info on compressing the index's or using some sort of archiving setup ? In other words whats the best way to save space when using ES ?
Does anyone have any recommendations on configuring ES so that the DB size is similar to that of the original data that was parsed into it ?
My CIO wont take me seriously when i tell him for every TB of data we need 3 TB in ES
Thanks
D
On Oct 3, 2012, at 2:16 AM, Otis Gospodnetic wrote:
Hello Dylan,
ES stored the original JSON in the special _source field. That right there means your index will be at least 10GB in size. Additionally, it is possible your fields are also stored and not just index - the store part would be another 10GB. On top of that is the inverted index. But I'm not sure how you get to 80 GB. Maybe you are using ngrams somewhere? Maybe you can check with Skywalker plugin what's inside your index?
On Tuesday, October 2, 2012 2:11:56 PM UTC-4, Dylan Johnson wrote:
I have a directory that has 10GB of data and i used Logstash to parse the date to elasticsearch which works fine. I have 2 index and 0 replication however after logstash has finished parsing the date to ES the ES /data directory is 80GB ?
On Wednesday, October 3, 2012 1:41:41 PM UTC-4, Dylan Johnson wrote:
Seems like my replicas were making up the space.
So i start with a couple of GB of data and end up with about x3 , this
seems expensive ?
Does anyone have any info on compressing the index's or using some sort of
archiving setup ? In other words whats the best way to save space when
using ES ?
Does anyone have any recommendations on configuring ES so that the DB size
is similar to that of the original data that was parsed into it ?
My CIO wont take me seriously when i tell him for every TB of data we need
3 TB in ES
Thanks
D
On Oct 3, 2012, at 2:16 AM, Otis Gospodnetic wrote:
Hello Dylan,
ES stored the original JSON in the special _source field. That right
there means your index will be at least 10GB in size. Additionally, it is
possible your fields are also stored and not just index - the store part
would be another 10GB. On top of that is the inverted index. But I'm not
sure how you get to 80 GB. Maybe you are using ngrams somewhere? Maybe
you can check with Skywalker plugin what's inside your index?
On Tuesday, October 2, 2012 2:11:56 PM UTC-4, Dylan Johnson wrote:
I have a directory that has 10GB of data and i used Logstash to parse the
date to elasticsearch which works fine. I have 2 index and 0 replication
however after logstash has finished parsing the date to ES the ES /data
directory is 80GB ?
If you use "vanilla" logstash, without changing anything, then you can look here for some hints (specifically, the new stored compression): GitHub - elastic/logstash: Logstash - transport and process your logs, events, or other data. @whack also gisted (but I can't find it) an experiment that he did with storage sizes with ES, can't find it now, you can possibly ping him.
So i start with a couple of GB of data and end up with about x3 , this seems expensive ?
Does anyone have any info on compressing the index's or using some sort of archiving setup ? In other words whats the best way to save space when using ES ?
Does anyone have any recommendations on configuring ES so that the DB size is similar to that of the original data that was parsed into it ?
My CIO wont take me seriously when i tell him for every TB of data we need 3 TB in ES
Thanks
D
On Oct 3, 2012, at 2:16 AM, Otis Gospodnetic wrote:
Hello Dylan,
ES stored the original JSON in the special _source field. That right there means your index will be at least 10GB in size. Additionally, it is possible your fields are also stored and not just index - the store part would be another 10GB. On top of that is the inverted index. But I'm not sure how you get to 80 GB. Maybe you are using ngrams somewhere? Maybe you can check with Skywalker plugin what's inside your index?
On Tuesday, October 2, 2012 2:11:56 PM UTC-4, Dylan Johnson wrote:
I have a directory that has 10GB of data and i used Logstash to parse the date to elasticsearch which works fine. I have 2 index and 0 replication however after logstash has finished parsing the date to ES the ES /data directory is 80GB ?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.