Elasticsearch on EC2. What kind of instance types to use?

Hi,

first, as you suggest, it really, really depends on what you want to do
with elasticsearch :slight_smile: There's no generic "optimal configuration". With that
in mind, I can share my experiences:

  • It does not matter that much if you have many weak boxes or less
    strong boxes. What counts is the amount of RAM you'll be using for your use
    case, and you can spread that amount as you wish.

  • Obviously, the less boxes you have, the less networking overhead you
    generate, which may affect you. In general, I always preffered "less
    stronger boxes", but everyone's mileage varies.

  • In the "more weaker boxes" approach, you're able to scale by adding less
    capacity, which may be financially more attractive.

  • I wouldn't go below m1.xlarge (15GB RAM) for a serious cluster with
    lots of faceting, sorting, etc.

  • m1.2xlarge (34.2GB) boxes perform really well, they have a serious
    drawback, though: they can't use the high I/O (IOPS) EBS volumes, which are
    a great fit for ES, making everything from index loading to searching
    snappy. m1.xlarge and m2.4xlarge support them, see
    http://aws.amazon.com/ec2/instance-types.

  • When you have multiple boxes, make 100% sure you have some automated
    configuration management such as Puppet, Chef, etc in place. You'll need to
    synchronize configs on the boxes, run commands on them, etc. See the
    https://github.com/karmi/cookbook-elasticsearch cookbook for Chef.

  • Make absolutely sure you're using a recent Java version -- the versions
    on the AMIs provided by Amazon are outdated by couple of years in some
    cases, so update your packages and install a newer one.

Karel

--