How big can/should you scale Elasticsearch

Rob_Blackin · August 29, 2014, 5:27pm

We are trying to implement a 5 TB, 10 Billion item Elasticsearch cluster.
The key is an integer and the item data is fairly small.

We seem to run into issues around loading. Seems to slow down as the index
gets bigger.

We are doing this on EC2 i2.xlarge nodes.

How many documents/TB do you think we can load per node max?

So if we can do 2 Billion each then we need 5 nodes. We are trying to size
it.

Any advice is welcome. Even if it is that this is not a good thing to do

thanks

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3faa4de9-0a27-49dc-8f68-ceebd5569da9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Arie · August 29, 2014, 7:17pm

When you look to the guys @ found (Official Elasticsearch Pricing: Elastic Cloud, Managed Elasticsearch | Elastic) then the
data on one ES server is 8 times memory,
if it should run smooth, but do not know how valuable that is. If you have
a lot of ES nodes, then consider one master
node without data, it's a best practice I have read somewhere.

16GB Memory equals 128GB data.

On Friday, August 29, 2014 7:27:28 PM UTC+2, Rob Blackin wrote:

We are trying to implement a 5 TB, 10 Billion item Elasticsearch cluster.
The key is an integer and the item data is fairly small.

We seem to run into issues around loading. Seems to slow down as the index
gets bigger.

We are doing this on EC2 i2.xlarge nodes.

How many documents/TB do you think we can load per node max?

So if we can do 2 Billion each then we need 5 nodes. We are trying to size
it.

Any advice is welcome. Even if it is that this is not a good thing to do

thanks

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c3e4601d-8564-47f6-b3b3-0fdb91fac96e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

nik9000 · August 29, 2014, 7:38pm

On Fri, Aug 29, 2014 at 1:27 PM, Rob Blackin robblackin@gmail.com wrote:

We are trying to implement a 5 TB, 10 Billion item Elasticsearch cluster.
The key is an integer and the item data is fairly small.

We're running around 5.5TB right now without a problem. The biggest
annoyance is that rolling restarts take time proportional to how much data
you have.

We have much larger documents then you have so we only store 181 million or
so. Our documents are interactively maintained - a consistent portion of
them are updated daily with some creates and a few rare deletes.

You might want to think about how you do sharding - look into routing to
see if you can get away with oversubscribing on shards. You might also
look into using multiple indexes as well. Shay gave a talk on how you
could subdivide one large set of data into multiple indexes to help
things. One 5TB index would be difficult to maintain. As are any shards
that are more then, say, 20GB. Just shuffling those shards from system to
system for rebalancing gets expensive. Merges on those shards have a
higher upper bound on disk io and cache thrash.

We seem to run into issues around loading. Seems to slow down as the index
gets bigger.

Check on your merge rate. This is old but it'll give you some idea of what
is going on:

You can tune this a bit - especially if your data comes in spurts.

We are doing this on EC2 i2.xlarge nodes.

How many documents/TB do you think we can load per node max?

So if we can do 2 Billion each then we need 5 nodes. We are trying to size
it.

I can't talk to Amazon because we use physical machines. We use 18
machines with two reasonably nice Intel ssds per machine, 96GB of ram, and
pretty sizeable CPUs and it isn't really enough to handle the query load we
want to throw at it. I imagine the shape of your load is going to be of a
different though.

Nik

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd1-59_4WQyKGFOsWBDmZd8iYu9agQPszwh80rB8g8vQ4Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
ElasticSearch Performance Elasticsearch	4	348	October 12, 2020
Cluster sizing Elasticsearch	7	407	July 6, 2017
Cluster size (help to define it) Elasticsearch	2	365	June 2, 2020
Trying to optimize Elasticsearch cluster Elasticsearch	3	963	February 20, 2017
Scaling: Cluster for speed or for size? Elasticsearch	6	356	July 6, 2017

How big can/should you scale Elasticsearch

Related topics