llowder
(llowder@oreillyauto.com)
May 18, 2012, 6:41pm
1
Currently testing out a program that uses ElasticSearch to store large
amounts of data (from log files).
My initial test showed a 250% increase in disk usage from the raw data to
what was stored in ES.
I have looked online some, but am new enough to this that I may not have
used the right terms.
What options are available to to optimize/tune disk usage when using ES?
Can someone please either give me some tips or point me to some documents?
Mostly what I will be storing is logs from tomcat, apache and 70+ web apps.
Thank you.
This sounds reasonable to me.
If you have replicas and indexes and are storing most of your data, then 2.5x is reasonable. (Index ~ data/3, so 2 x (data + data/3) ~2.5x data.)
-- Paul
On May 18, 2012, at 1:41 PM, llowder@oreillyauto.com wrote:
Currently testing out a program that uses Elasticsearch to store large amounts of data (from log files).
My initial test showed a 250% increase in disk usage from the raw data to what was stored in ES.
I have looked online some, but am new enough to this that I may not have used the right terms.
What options are available to to optimize/tune disk usage when using ES? Can someone please either give me some tips or point me to some documents?
Mostly what I will be storing is logs from tomcat, apache and 70+ web apps.
Thank you.
otisg
(Otis Gospodnetić)
May 19, 2012, 1:08am
3
Hi,
Things to look into:
number of replicas. Set to 0 and then compare sizes.
_source is on and uncompressed?
_all is on?
individual fields are marked as stored or just indexed?
Otis
Performance Monitoring for Solr / Elasticsearch / HBase -
Bring together your servers, Apps, Metrics, Logs & Events ✓ Over 40 integrations to easily collect metrics & events across your whole stack ✓ Start now!
On Friday, May 18, 2012 2:41:04 PM UTC-4, llo...@oreillyauto.com wrote:
Currently testing out a program that uses Elasticsearch to store large
amounts of data (from log files).
My initial test showed a 250% increase in disk usage from the raw data to
what was stored in ES.
I have looked online some, but am new enough to this that I may not have
used the right terms.
What options are available to to optimize/tune disk usage when using ES?
Can someone please either give me some tips or point me to some documents?
Mostly what I will be storing is logs from tomcat, apache and 70+ web apps.
Thank you.
kimchy
(Shay Banon)
May 20, 2012, 10:03pm
4
Here is some info on the logstash wiki on how to reduce the storage:
Logstash - transport and process your logs, events, or other data - GitHub - elastic/logstash: Logstash - transport and process your logs, events, or other data
.
On Fri, May 18, 2012 at 8:41 PM, llowder@oreillyauto.com <
llowder@oreillyauto.com > wrote:
Currently testing out a program that uses Elasticsearch to store large
amounts of data (from log files).
My initial test showed a 250% increase in disk usage from the raw data to
what was stored in ES.
I have looked online some, but am new enough to this that I may not have
used the right terms.
What options are available to to optimize/tune disk usage when using ES?
Can someone please either give me some tips or point me to some documents?
Mostly what I will be storing is logs from tomcat, apache and 70+ web apps.
Thank you.
On May 18, 11:41 am, "llow...@oreillyauto.com "
llow...@oreillyauto.com wrote:
Can someone please either give me some tips or point me to some documents?
You can use Luke [Google Code Archive - Long-term storage for Google Code Project Hosting. ] to peek inside the
index and see if there is anything unnecessary in there.