Can ES scale to 30TB / day, and still be usable?

Can ES scale to 30TB / day, and still be usable?

This is a typical logstash/elasticsearch/kibana setup. I have a small
environment logging 20GB / day that seems to work fine. At 30TB, very
little will be able to cached into ram, can ES still be usable at that
point?

Also, what's is the best way to pick the proper index creation rate (per
day, per hour?). Is there a guideline for max. index size?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/865e67f5-bcd4-4247-9f39-813424d6747c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

That's a massive volume! You should be able to get this to work, though
you'll need to plan things and have access to a large number of servers and
easy provisioning to let you expand as you grow.

Given the volume you may want to consider hourly indexes. Index size isn't
so much a problem as segment size, you should aim to keep each segment
between 10-20GB to allow for optimal reallocation and merging.

You should also consider a tiered system, have your active (eg last 6-12
hours) indexes on a few high performance machines with SSDs/flash to allow
optimal indexing and retrieval, and then move the indexes onto slower
SAS/SATA drives after that time.

A lot of what you may need depends on our retention period and how you
expect to have the data accessed.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 20 September 2014 15:22, Nelson Jeppesen nelson.jeppesen@gmail.com
wrote:

Can ES scale to 30TB / day, and still be usable?

This is a typical logstash/elasticsearch/kibana setup. I have a small
environment logging 20GB / day that seems to work fine. At 30TB, very
little will be able to cached into ram, can ES still be usable at that
point?

Also, what's is the best way to pick the proper index creation rate (per
day, per hour?). Is there a guideline for max. index size?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/865e67f5-bcd4-4247-9f39-813424d6747c%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/865e67f5-bcd4-4247-9f39-813424d6747c%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624bM1f-2kXZq2w3jHnTuaAnT7s2Qsm0D5kE6JmsFHGvkmg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Thank you Mark,

Its good to know it's possible. Now I just need to start testing, lots. :slight_smile:

My target is a retention of 10 days and with 5 clusters (one per
datacenter) behind a tribe node. The problem is that if I do hourly
indexes, thats 1200 indexes for 10 days (5 * 24 * 10 ). Kibana (the primary
use case) does not work well over a few hundred indexes. Aliases may help,
but that will require some custom code.

On Saturday, September 20, 2014 3:35:47 PM UTC-7, Mark Walkom wrote:

That's a massive volume! You should be able to get this to work, though
you'll need to plan things and have access to a large number of servers and
easy provisioning to let you expand as you grow.

Given the volume you may want to consider hourly indexes. Index size isn't
so much a problem as segment size, you should aim to keep each segment
between 10-20GB to allow for optimal reallocation and merging.

You should also consider a tiered system, have your active (eg last 6-12
hours) indexes on a few high performance machines with SSDs/flash to allow
optimal indexing and retrieval, and then move the indexes onto slower
SAS/SATA drives after that time.

A lot of what you may need depends on our retention period and how you
expect to have the data accessed.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com <javascript:>
web: www.campaignmonitor.com

On 20 September 2014 15:22, Nelson Jeppesen <nelson....@gmail.com
<javascript:>> wrote:

Can ES scale to 30TB / day, and still be usable?

This is a typical logstash/elasticsearch/kibana setup. I have a small
environment logging 20GB / day that seems to work fine. At 30TB, very
little will be able to cached into ram, can ES still be usable at that
point?

Also, what's is the best way to pick the proper index creation rate (per
day, per hour?). Is there a guideline for max. index size?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/865e67f5-bcd4-4247-9f39-813424d6747c%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/865e67f5-bcd4-4247-9f39-813424d6747c%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9c34f5a0-3030-4ec7-9d3d-c779f80d5187%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Mind sharing size of clusters? Number of nodes and specs on boxes? Also,
interested in hearing your experience with the tribe node.

If you do end up using a setup with a lot of indices it is always helpful
to drop bloom filters on indices not being actively indexed to to alleviate
memory pressure. That might not be practical if you are indexing data from
several days back. It will tank your indexing performance.

In theory tiering sounds great. My only concern would be constantly
shipping data off to nodes focused on storage versus the ones it was
originally indexed on. It could be a lot of network and disk I/O overhead
and depending on how hot you are running your nodes might impact indexing.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8b39ff31-45ff-4e08-95bb-c124fed669c1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.