Im researching possibility of moving from using solr to elastic search,
because to be honest it looks really cool and i cant wait to start
playing with it
but before i will start i would like to get some tips/suggestions from more
experienced users; the things im interested in are:
what instance size will be the best considering economy/power (so for
example m1.xlarge instance is twice as expensive as m1.large, but will it
increase proportionally performance of ES cluster (in terms of response
time, and amount of concurrent requests it can process))
how many shards i can run on every node (by default it is 5, how many
more i can use without affecting performance)
is it better to have separate cluster per index or it doesn't matter
(from performance point of view)
EBS vs ephemeral vs SSD drives (how big is the performance difference ?)
are ephemeral drives safe enough with replication factor lets say 3
how consistent is the performance of ES on EC2 (will response time
spike from time to time above 2-3 sec because of some commits to the index?)
statistics from my current solr instances:
Number of instances: 3 (m1.xlarge)
Number of documents: ~15m
Requests per second: ~10
Results page size: 16 documents
Average total count per query: ~100k documents
queries are quite different one from another, so they are not easy to cache
we are using faceting, filtering by field, custom sorting, ...
One of the biggest issues with EC2 in general is availability. Instances
can go down. EBS volumes are not invulnarable either. When individual
instances or volumes go down you are typically fine because replication
saves you.
But when the whole zone has a problem, and that seems to happen at least
once per year, at least in the east zone which is in North Virginia, then
replication within the same zone doesn't help. Then you start thinking
about having nodes in multiple zones, and with that comes extra cost.
depends, you'll want to test
depends, you'll want to test
you can have multiple indices per cluster, as long as you are not
overwhelming it, so again it depends on the details
didn't test it, but I imagine the difference is big. That said, Amazon
has a deal wit guaranteed IOPS and EBS and just announced a new option
today, I think
yes, typically, but see the paragraph above
there is no consistency, which is why it's hard to test performance of
EC2. Look up info on noisy neighbour.
On Wednesday, November 7, 2012 1:42:20 PM UTC-5, Karol Gwaj wrote:
Hi,
Im researching possibility of moving from using solr to Elasticsearch,
because to be honest it looks really cool and i cant wait to start
playing with it
but before i will start i would like to get some tips/suggestions from
more experienced users; the things im interested in are:
what instance size will be the best considering economy/power (so for
example m1.xlarge instance is twice as expensive as m1.large, but will it
increase proportionally performance of ES cluster (in terms of response
time, and amount of concurrent requests it can process))
how many shards i can run on every node (by default it is 5, how many
more i can use without affecting performance)
is it better to have separate cluster per index or it doesn't matter
(from performance point of view)
EBS vs ephemeral vs SSD drives (how big is the performance difference
?)
are ephemeral drives safe enough with replication factor lets say 3
how consistent is the performance of ES on EC2 (will response time
spike from time to time above 2-3 sec because of some commits to the index?)
statistics from my current solr instances:
Number of instances: 3 (m1.xlarge)
Number of documents: ~15m
Requests per second: ~10
Results page size: 16 documents
Average total count per query: ~100k documents
queries are quite different one from another, so they are not easy to
cache
we are using faceting, filtering by field, custom sorting, ...
what instance size will be the best considering economy/power
(so for example m1.xlarge instance is twice as expensive as
m1.large, but will it increase proportionally performance of ES
cluster (in terms of response time, and amount of concurrent
requests it can process))
Totally depends on your data. Will probably have to experiment here.
how many shards i can run on every node (by default it is 5,
how many more i can use without affecting performance)
You can likely go way higher that single digits. You don't want a
design where they can grow indefinitely, but don't be afraid of using
them. Keep in mind that for a single index you don't need more
than one per node. Watch some overview talks like this one to get a
better idea of what these concepts mean.
is it better to have separate cluster per index or it doesn't
matter (from performance point of view)
Typically not.
EBS vs ephemeral vs SSD drives (how big is the performance difference ?)
Generally speaking, for a lot of data that can't fit in memory, SSDs
will improve your disk seeks. But you really have to profile to
determine if the cost is worth it.
are ephemeral drives safe enough with replication factor lets say 3
"Safe enough" can only be defined by you. It isn't terribly
likely that three ec2 nodes can disappear or corrupt, but I've seen
it before. Normal storage practices apply here.
how consistent is the performance of ES on EC2 (will response
time spike from time to time above 2-3 sec because of some commits
to the index?)
Depends on usage. Heavy indexing can affect search perf but more
replicas helps.
queries are quite different one from another, so they are not easy to cache
we are using faceting, filtering by field, custom sorting, ...
Faceting and sorting will typically use more memory than simple
queries. Start simple and gradually add functionality & data.
You'll get a feel for where limits are and whether you need to move
to bigger hardware.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.