Elasticsearch on EC2. What kind of instance types to use?

Hi,

I guess many of us are deploying elasticsearch on EC2 or plan to do it. It
would be great if some ES experts can give us some hints on how to choose
the EC2 instance types.

I know, price is the first parameter to consider. But still, they are many
options on how to spend our money : x strong box vs 2x less strong box ?
RAM vs CPU ? EBS ? I guess, it depends on the usage (facets ...).

For example, What would you recommend ? A 4 node cluster with 32Go ram
instances or a 7 node cluster with 16 Go ram instances ?

If any of you have some recommendations to give ...

Thanks.

--

Interesting topic, look forward to the several different approaches.

If you allow me to be so free: would a 14 node cluster with just 8GB of RAM
be interesting (m1.large)? Or even a 28 node cluster with 4GB (m1.medium)?

Best regards,

Robin Verlangen
Software engineer
*
*
W http://www.robinverlangen.nl
E robin@us2.nl

http://goo.gl/Lt7BC

Disclaimer: The information contained in this message and attachments is
intended solely for the attention and use of the named addressee and may be
confidential. If you are not the intended recipient, you are reminded that
the information remains the property of the sender. You must not use,
disclose, distribute, copy, print or rely on this e-mail. If you have
received this message in error, please contact the sender immediately and
irrevocably delete this message and any copies.

On Mon, Dec 3, 2012 at 11:42 AM, fonzo14 thomas_mahier@yahoo.fr wrote:

Hi,

I guess many of us are deploying elasticsearch on EC2 or plan to do it. It
would be great if some ES experts can give us some hints on how to choose
the EC2 instance types.

I know, price is the first parameter to consider. But still, they are many
options on how to spend our money : x strong box vs 2x less strong box ?
RAM vs CPU ? EBS ? I guess, it depends on the usage (facets ...).

For example, What would you recommend ? A 4 node cluster with 32Go ram
instances or a 7 node cluster with 16 Go ram instances ?

If any of you have some recommendations to give ...

Thanks.

--

--

Hi,

first, as you suggest, it really, really depends on what you want to do
with elasticsearch :slight_smile: There's no generic "optimal configuration". With that
in mind, I can share my experiences:

  • It does not matter that much if you have many weak boxes or less
    strong boxes. What counts is the amount of RAM you'll be using for your use
    case, and you can spread that amount as you wish.

  • Obviously, the less boxes you have, the less networking overhead you
    generate, which may affect you. In general, I always preffered "less
    stronger boxes", but everyone's mileage varies.

  • In the "more weaker boxes" approach, you're able to scale by adding less
    capacity, which may be financially more attractive.

  • I wouldn't go below m1.xlarge (15GB RAM) for a serious cluster with
    lots of faceting, sorting, etc.

  • m1.2xlarge (34.2GB) boxes perform really well, they have a serious
    drawback, though: they can't use the high I/O (IOPS) EBS volumes, which are
    a great fit for ES, making everything from index loading to searching
    snappy. m1.xlarge and m2.4xlarge support them, see
    http://aws.amazon.com/ec2/instance-types.

  • When you have multiple boxes, make 100% sure you have some automated
    configuration management such as Puppet, Chef, etc in place. You'll need to
    synchronize configs on the boxes, run commands on them, etc. See the
    https://github.com/karmi/cookbook-elasticsearch cookbook for Chef.

  • Make absolutely sure you're using a recent Java version -- the versions
    on the AMIs provided by Amazon are outdated by couple of years in some
    cases, so update your packages and install a newer one.

Karel

--

If you allow me to be so free: would a 14 node cluster with just 8GB of
RAM be interesting (m1.large)? Or even a 28 node cluster with 4GB
(m1.medium)?

Why would you prefer that to less stronger boxes? Financially, it's all the
same in AWS, you pay for resources used, not for instances per se.

Karel

--

I'm curious about the CPU vs memory thing, because I was surprised when I
saw this high CPU, low MEM usage pattern for ES:

PID USER PR NI VIRT RES SHR S %CPU %MEM

26320 elastics 20 0 1565m 439m 3632 S 89.4 11.7

This is on an m1.medium.

Also, this is while I'm bombarding the thing with queries in order to warm
it up, maybe that's why...?

On Tuesday, December 4, 2012 6:25:58 AM UTC-5, Karel Minařík wrote:

If you allow me to be so free: would a 14 node cluster with just 8GB of

RAM be interesting (m1.large)? Or even a 28 node cluster with 4GB
(m1.medium)?

Why would you prefer that to less stronger boxes? Financially, it's all
the same in AWS, you pay for resources used, not for instances per se.

Karel

--

Hello,

It depends on the amount of data and how it looks like, plus the queries,
filters or facets. And also on how often indexing and search operations are
done.

Indexing is usually very CPU and IO bound. But doing lots of faceting and
filtering will eat memory. Facets usually load fields and/or IDs in memory,
and filters are mostly cached. Then there are field caches - the more data
you have, the more memory you'll need. As you can see, most of the memory
usage is for performance reasons, and that's usually a good thing. And most
of them are configurable, especially the caches.

If I'd have to choose on memory vs CPU vs IO performance, I'd take a subset
of the production data on a small test cluster and do some performance
testing (again, with what I'd expect for production), while monitoring the
ES cluster. Then, I should have an idea of which resources are needed more,
but not before tuning the ES configuration to fit my needs.

Best regards,
Radu

http://sematext.com/ -- Elasticsearch -- Solr -- Lucene

On Tue, Dec 4, 2012 at 5:51 PM, Daniel Weitzenfeld
dweitzenfeld@gmail.comwrote:

I'm curious about the CPU vs memory thing, because I was surprised when I
saw this high CPU, low MEM usage pattern for ES:

PID USER PR NI VIRT RES SHR S %CPU %MEM

26320 elastics 20 0 1565m 439m 3632 S 89.4 11.7

This is on an m1.medium.

Also, this is while I'm bombarding the thing with queries in order to warm
it up, maybe that's why...?

On Tuesday, December 4, 2012 6:25:58 AM UTC-5, Karel Minařík wrote:

If you allow me to be so free: would a 14 node cluster with just 8GB of

RAM be interesting (m1.large)? Or even a 28 node cluster with 4GB
(m1.medium)?

Why would you prefer that to less stronger boxes? Financially, it's all
the same in AWS, you pay for resources used, not for instances per se.

Karel

--

--

Thanks for your recommendations. It's very helpful.

--