Hi,
first, as you suggest, it really, really depends on what you want to do
with elasticsearch There's no generic "optimal configuration". With that
in mind, I can share my experiences:
It does not matter that much if you have many weak boxes or less
strong boxes. What counts is the amount of RAM you'll be using for your use
case, and you can spread that amount as you wish.Obviously, the less boxes you have, the less networking overhead you
generate, which may affect you. In general, I always preffered "less
stronger boxes", but everyone's mileage varies.In the "more weaker boxes" approach, you're able to scale by adding less
capacity, which may be financially more attractive.I wouldn't go below
m1.xlarge
(15GB RAM) for a serious cluster with
lots of faceting, sorting, etc.
m1.2xlarge
(34.2GB) boxes perform really well, they have a serious
drawback, though: they can't use the high I/O (IOPS) EBS volumes, which are
a great fit for ES, making everything from index loading to searching
snappy.m1.xlarge
andm2.4xlarge
support them, see
http://aws.amazon.com/ec2/instance-types.When you have multiple boxes, make 100% sure you have some automated
configuration management such as Puppet, Chef, etc in place. You'll need to
synchronize configs on the boxes, run commands on them, etc. See the
https://github.com/karmi/cookbook-elasticsearch cookbook for Chef.Make absolutely sure you're using a recent Java version -- the versions
on the AMIs provided by Amazon are outdated by couple of years in some
cases, so update your packages and install a newer one.Karel
--