Performance Issues

Hi,

We are a small startup, and are using elasticsearch in a scheme where most our documents are similar to users/albums/photos docs, containing fields such as album names, photo size, comments etc.

We've had a great experience with elasticsearch, up until we've hit a scaling issue.
The main problem seems to be write operations. They gradually started to create a tremendous amount of CPU.
We've been cracking our head this past week and I really hope this esteemed forum could help us.

I'll try to describe the setup and use case, and provide as much info to help profile:

  • ElasticSearch version is 0.20.4
  1. A single shard with a single replica on an amazon m1.large machine (7.5 GB RAM, 2 virtual CPU with 2 EC2 compute units each)

  2. about 150,000,000 records

  3. ES size is 250 GB (index + data) - How can I tell the breakdown, how much the index takes, and how much the storage?

  4. Right now we are about 20-30 writes per second - This creates almost a 100% use in CPU. Those writes are almost exclusively _update scripts that add or change a single(!) field. Would have thought this is a cheap operation (obviously I reached an unhealthy point where this costs alot of cpu).

  5. Heap Size is defined as 5.8GB, JVM shows use in a cycle bounded by 3-5 GB. The rest of the parameters were as in default deployment of ES.

  6. CPU utilization pattern can be seen here: http://i.imgur.com/pmg97ty.png

  7. I created a gist of the _hot_threads endpoint and the _stats enpoint here.
    Note that there are two files in the gist, and that the relevant index is called my_index_more, the others were just sandboxes.

  8. I have 80 segments, many of them are at 5GB (the maximum defined by default)

Should i have less segments? increase the max segment size?
Have I reached a past the point of a "healthy" lucene index? why? what should I do? Shard the data to several Lucene indexes?
What is a healthy number of segments in an index? whats a healthy size per shard (GB-wise and document count-wise)?

And I guess the real question is, i'm missing intuition here.
What tools should I use to further anaylze?
What should I read to gain a better understanding of my problem?

Hope you guys can help!

You are on the right track, I think. If you have a heap of 5.8 G and
large segment setting is 5 G, it will work, but you will observe severe
load peaks. My experience with 5G large segments are the same. So I
always use 2G for my kind of heap (4-8G) and workload (doc writing, ~1k
dps).

While writing new docs on many large sized segments, the chance is high
that Lucene enters large segment merging more often. Choose a smaller
size, and see how your JVMs can cope with the streamlined segment
merging. If you are heavy on search, you might need more "optimizing" (=
reducing segment numbers) after indexing phases.

Jörg

Am 04.06.13 07:57, schrieb eranid:

  1. I have 80 segments, many of them are at 5GB (the maximum defined by
    default)

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

So, your advice is to limit the maximal segment size to 2GB for a memory of 5.8GB?

Could you elaborate on why this would help? are the segments loaded into memory interchangably when I do a "write" operation?

Sound like I'm misunderstanding, but can it be that:
The correct segment is identified, loaded to memory, changed according to the script and then returned to disk?

Also, what happens if I change the maximal segment size now? will the segments be slowly reduced from 5GB to 2GB by elasticsearch?

What is the price/tradeoff of having alot of smaller segments?

Thanks alot!