How to optimize disk usage?


(llowder@oreillyauto.com) #1

Currently testing out a program that uses ElasticSearch to store large
amounts of data (from log files).

My initial test showed a 250% increase in disk usage from the raw data to
what was stored in ES.

I have looked online some, but am new enough to this that I may not have
used the right terms.

What options are available to to optimize/tune disk usage when using ES?
Can someone please either give me some tips or point me to some documents?

Mostly what I will be storing is logs from tomcat, apache and 70+ web apps.

Thank you.


(Paul Brown) #2

This sounds reasonable to me.

If you have replicas and indexes and are storing most of your data, then 2.5x is reasonable. (Index ~ data/3, so 2 x (data + data/3) ~2.5x data.)

-- Paul

On May 18, 2012, at 1:41 PM, llowder@oreillyauto.com wrote:

Currently testing out a program that uses ElasticSearch to store large amounts of data (from log files).

My initial test showed a 250% increase in disk usage from the raw data to what was stored in ES.

I have looked online some, but am new enough to this that I may not have used the right terms.

What options are available to to optimize/tune disk usage when using ES? Can someone please either give me some tips or point me to some documents?

Mostly what I will be storing is logs from tomcat, apache and 70+ web apps.

Thank you.


(Otis Gospodnetić) #3

Hi,

Things to look into:

  • number of replicas. Set to 0 and then compare sizes.
  • _source is on and uncompressed?
  • _all is on?
  • individual fields are marked as stored or just indexed?

Otis

Performance Monitoring for Solr / ElasticSearch / HBase -

On Friday, May 18, 2012 2:41:04 PM UTC-4, llo...@oreillyauto.com wrote:

Currently testing out a program that uses ElasticSearch to store large
amounts of data (from log files).

My initial test showed a 250% increase in disk usage from the raw data to
what was stored in ES.

I have looked online some, but am new enough to this that I may not have
used the right terms.

What options are available to to optimize/tune disk usage when using ES?
Can someone please either give me some tips or point me to some documents?

Mostly what I will be storing is logs from tomcat, apache and 70+ web apps.

Thank you.


(Shay Banon) #4

Here is some info on the logstash wiki on how to reduce the storage:


.

On Fri, May 18, 2012 at 8:41 PM, llowder@oreillyauto.com <
llowder@oreillyauto.com> wrote:

Currently testing out a program that uses ElasticSearch to store large
amounts of data (from log files).

My initial test showed a 250% increase in disk usage from the raw data to
what was stored in ES.

I have looked online some, but am new enough to this that I may not have
used the right terms.

What options are available to to optimize/tune disk usage when using ES?
Can someone please either give me some tips or point me to some documents?

Mostly what I will be storing is logs from tomcat, apache and 70+ web apps.

Thank you.


(Eric Jain) #5

On May 18, 11:41 am, "llow...@oreillyauto.com"
llow...@oreillyauto.com wrote:

Can someone please either give me some tips or point me to some documents?

You can use Luke [http://code.google.com/p/luke/] to peek inside the
index and see if there is anything unnecessary in there.


(system) #6