Hi,
I have been trying to use ES for a search service for my site,
which is running on EC2 servers, and I am having a lot of performance
problems.
My data set is roughly 1.2M records of 30 fields each, mostly
Integers of Long numbers, a few short Strings and a Geo location
point.
Most of the search queries use a native custom score script.
I need to query the ES cluster with around 300 concurrent
requests, and I roughly need a performance of 40 requests per second.
I have got a staging environment, with around 2k records of the
same nature that the production data set has, with the default
sharding configuration and testing with only one ES instance it
achieves over 30 requests per second with the above mentioned amount
of concurrent requests.
My production cluster is constituted by 2 c1.xlarge instances,
which are 8 core boxes with 7 GB of memory each, running with Ubuntu
10.10 and ES 0.16.2.
Here is my elasticsearch.yml configuration file: https://gist.github.com/1066379
Right now I am testing a sharding configuration of 10 shards and 1
replica.
Nodes are discovered perfectly, I use the bulk api (from inside
the Java api) to index the data in bulks of around 1k, with no
indexing performance problems, and my client is connected to both ES
instances.
Before going live with the new service I decided to stress test
it, and after a lot of testing I discovered that I can not achieve, by
far, my desired performance.
With this configuration, when I run the stress test, with 100
concurrent clients making requests after around 10 seconds, system
load starts to increase on the elasticsearch servers to above 10, and
I start getting timeouts (8 seconds) on the stress test.
During this time there are no exceptions on the ES log which is on
INFO level.
I have also noticed an uneven distribution of the load, most of
the times the master node has a system load of above 10 when the other
node is having a system load of 2-3.
I have also tested without the custom script, with the same
results.
For the search queries I am using a BoolQuery with a few range
terms and a couple of Integer terms, and I am not using the full text
search.
If you have endured my ranting so far, here are a couple of
questions:
Is my configuration ok? Am I missing something?
What would your recommendation be to achieve my desired
performance? More smaller servers, more extra large instances, more
sharding, less sharding?
Should I discard ElasticSearch and use a different solution?
Thank you in advance,
Best regards,
Ariel Amato