Hi,
We are a small startup, and are using elasticsearch in a scheme where most our documents are similar to users/albums/photos docs, containing fields such as album names, photo size, comments etc.
We've had a great experience with elasticsearch, up until we've hit a scaling issue.
The main problem seems to be write operations. They gradually started to create a tremendous amount of CPU.
We've been cracking our head this past week and I really hope this esteemed forum could help us.
I'll try to describe the setup and use case, and provide as much info to help profile:
- ElasticSearch version is 0.20.4
-
A single shard with a single replica on an amazon m1.large machine (7.5 GB RAM, 2 virtual CPU with 2 EC2 compute units each)
-
about 150,000,000 records
-
ES size is 250 GB (index + data) - How can I tell the breakdown, how much the index takes, and how much the storage?
-
Right now we are about 20-30 writes per second - This creates almost a 100% use in CPU. Those writes are almost exclusively _update scripts that add or change a single(!) field. Would have thought this is a cheap operation (obviously I reached an unhealthy point where this costs alot of cpu).
-
Heap Size is defined as 5.8GB, JVM shows use in a cycle bounded by 3-5 GB. The rest of the parameters were as in default deployment of ES.
-
CPU utilization pattern can be seen here: http://i.imgur.com/pmg97ty.png
-
I created a gist of the _hot_threads endpoint and the _stats enpoint here.
Note that there are two files in the gist, and that the relevant index is called my_index_more, the others were just sandboxes. -
I have 80 segments, many of them are at 5GB (the maximum defined by default)
Should i have less segments? increase the max segment size?
Have I reached a past the point of a "healthy" lucene index? why? what should I do? Shard the data to several Lucene indexes?
What is a healthy number of segments in an index? whats a healthy size per shard (GB-wise and document count-wise)?
And I guess the real question is, i'm missing intuition here.
What tools should I use to further anaylze?
What should I read to gain a better understanding of my problem?
Hope you guys can help!