Storage capacity (planning)?

Hi there,

The reason I'm looking at Elastic Search being a totally different one
^1, I set up a development environment with about 20 servers that use
rsyslog to send off their logs to a logstash server (input, you guessed
it, syslog), and through Redis ultimately makes the syslog entries end
up in Elastic Search. I suppose this is the next-next-finish setup
documented on [1].

To my surprise, it only takes a day or so to get up to a storage volume
of ~25 GB in /var/lib/elasticsearch/.

It is particularly surprising to me, because the environment is largely
idle, other than some monitoring and some cron jobs -- there's not a lot
of syslog messages compared to a production environment, not at all.

Furthermore, using this rsyslog -> logstash collector -> redis ->
logstash indexer -> elasticsearch setup, I'm seeing the throughput on
the logical volume for the root filesystem rise continuously -- it's now
at about 4 MB/s. iotop merely suggests this is all Elasticsearch doing
the I/O, but its payload is on the aforementioned logical volume mounted
on /var/lib/elasticsearch/.

I'm fairly certain I can tweak the number of log entries being sent off
to the centralized log server, and it's not unlikely I'm doing something
wrong, but I was wondering whether anybody out there had gone through
such exercise before, and whether my expectations are correct.

Thanks, in advance,

Kind regards,

Jeroen van Meeuwen

^1: Kolab Groupware is looking in to developing a singular application
suite for the topics of Archival, Backup/Restore and e-Discovery. Very
much a work-in-progress, we're putting down some notes [2] and are doing
the initial probing at potential storage backend solutions.

[1] http://logstash.net/docs/1.3.3/tutorials/getting-started-centralized
[2] http://docs.kolab.org/architecture-and-design/bonnie.html

--
Systems Architect, Kolab Systems AG

e: vanmeeuwen at kolabsys.com
m: +44 74 2516 3817
w: http://www.kolabsys.com

pgp: 9342 BF08

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2fd3cb3bb2327950a8c1429e85949f3e%40kolabsys.com.
For more options, visit https://groups.google.com/groups/opt_out.

If you're not, you should put kibana into the mix. This will give you a
better understanding of what is going into ES (in fact this is what it was
designed for).
Also install elastichq, kopf and bigdesk for some cluster monitoring. There
is also elasticsearch-monitoring which is pretty good for longer term stats.

Once you have those you will better understand your cluster and throughput.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 7 February 2014 20:05, Jeroen van Meeuwen (Kolab Systems) <
vanmeeuwen@kolabsys.com> wrote:

Hi there,

The reason I'm looking at Elastic Search being a totally different one ^1,
I set up a development environment with about 20 servers that use rsyslog
to send off their logs to a logstash server (input, you guessed it,
syslog), and through Redis ultimately makes the syslog entries end up in
Elastic Search. I suppose this is the next-next-finish setup documented on
[1].

To my surprise, it only takes a day or so to get up to a storage volume of
~25 GB in /var/lib/elasticsearch/.

It is particularly surprising to me, because the environment is largely
idle, other than some monitoring and some cron jobs -- there's not a lot of
syslog messages compared to a production environment, not at all.

Furthermore, using this rsyslog -> logstash collector -> redis -> logstash
indexer -> elasticsearch setup, I'm seeing the throughput on the logical
volume for the root filesystem rise continuously -- it's now at about 4
MB/s. iotop merely suggests this is all Elasticsearch doing the I/O, but
its payload is on the aforementioned logical volume mounted on
/var/lib/elasticsearch/.

I'm fairly certain I can tweak the number of log entries being sent off to
the centralized log server, and it's not unlikely I'm doing something
wrong, but I was wondering whether anybody out there had gone through such
exercise before, and whether my expectations are correct.

Thanks, in advance,

Kind regards,

Jeroen van Meeuwen

^1: Kolab Groupware is looking in to developing a singular application
suite for the topics of Archival, Backup/Restore and e-Discovery. Very much
a work-in-progress, we're putting down some notes [2] and are doing the
initial probing at potential storage backend solutions.

[1] Getting Started with Logstash | Logstash Reference [8.11] | Elastic
[2] http://docs.kolab.org/architecture-and-design/bonnie.html

--
Systems Architect, Kolab Systems AG

e: vanmeeuwen at kolabsys.com
m: +44 74 2516 3817
w: http://www.kolabsys.com

pgp: 9342 BF08

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/2fd3cb3bb2327950a8c1429e85949f3e%40kolabsys.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624boYaFTD8RV6JcVXsVB%2B2Wjf1ow09iugj5U3Ps7Me-JCg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Jeroen,

If your objective is to keep the ES storage as minimal as possible, you'd
probably want to understand first what your search requirements are and
then optimize the ES indexes accordingly. For example, if you don't need
replicas, then you can set it to 0. If you don't need the _all field, you
can disable it (using index templates for example). If you don't need every
single field from your log event indexed, then you can direct your LS
filters to only output specific fields that you are interested in. Etc,
etc...

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/105b2dac-33aa-44ad-8961-229b3aad4905%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.