Performance Issues

Eran_Eidinger · June 4, 2013, 5:57am

Hi,

We are a small startup, and are using elasticsearch in a scheme where most our documents are similar to users/albums/photos docs, containing fields such as album names, photo size, comments etc.

We've had a great experience with elasticsearch, up until we've hit a scaling issue.
The main problem seems to be write operations. They gradually started to create a tremendous amount of CPU.
We've been cracking our head this past week and I really hope this esteemed forum could help us.

I'll try to describe the setup and use case, and provide as much info to help profile:

ElasticSearch version is 0.20.4

A single shard with a single replica on an amazon m1.large machine (7.5 GB RAM, 2 virtual CPU with 2 EC2 compute units each)
about 150,000,000 records
ES size is 250 GB (index + data) - How can I tell the breakdown, how much the index takes, and how much the storage?
Right now we are about 20-30 writes per second - This creates almost a 100% use in CPU. Those writes are almost exclusively _update scripts that add or change a single(!) field. Would have thought this is a cheap operation (obviously I reached an unhealthy point where this costs alot of cpu).
Heap Size is defined as 5.8GB, JVM shows use in a cycle bounded by 3-5 GB. The rest of the parameters were as in default deployment of ES.
CPU utilization pattern can be seen here: http://i.imgur.com/pmg97ty.png
I created a gist of the _hot_threads endpoint and the _stats enpoint here.
Note that there are two files in the gist, and that the relevant index is called my_index_more, the others were just sandboxes.
I have 80 segments, many of them are at 5GB (the maximum defined by default)

Should i have less segments? increase the max segment size?
Have I reached a past the point of a "healthy" lucene index? why? what should I do? Shard the data to several Lucene indexes?
What is a healthy number of segments in an index? whats a healthy size per shard (GB-wise and document count-wise)?

And I guess the real question is, i'm missing intuition here.
What tools should I use to further anaylze?
What should I read to gain a better understanding of my problem?

Hope you guys can help!

jprante · June 4, 2013, 5:08pm

You are on the right track, I think. If you have a heap of 5.8 G and
large segment setting is 5 G, it will work, but you will observe severe
load peaks. My experience with 5G large segments are the same. So I
always use 2G for my kind of heap (4-8G) and workload (doc writing, ~1k
dps).

While writing new docs on many large sized segments, the chance is high
that Lucene enters large segment merging more often. Choose a smaller
size, and see how your JVMs can cope with the streamlined segment
merging. If you are heavy on search, you might need more "optimizing" (=
reducing segment numbers) after indexing phases.

Jörg

Am 04.06.13 07:57, schrieb eranid:

I have 80 segments, many of them are at 5GB (the maximum defined by
default)

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Eran_Eidinger · June 4, 2013, 8:27pm

So, your advice is to limit the maximal segment size to 2GB for a memory of 5.8GB?

Could you elaborate on why this would help? are the segments loaded into memory interchangably when I do a "write" operation?

Sound like I'm misunderstanding, but can it be that:
The correct segment is identified, loaded to memory, changed according to the script and then returned to disk?

Also, what happens if I change the maximal segment size now? will the segments be slowly reduced from 5GB to 2GB by elasticsearch?

What is the price/tradeoff of having alot of smaller segments?

Thanks alot!

Topic		Replies	Views
Regarding memory consumption in elastic search Elasticsearch	7	1467	July 6, 2017
Adding millions of documents, performance decay Elasticsearch	6	672	July 6, 2017
Optimize api working inconsistently...a bug Elasticsearch	3	813	July 6, 2017
What's using memory in ElasticSearch? (Details to follow...) Elasticsearch	8	1971	July 6, 2017
Write throughput test on elasticsearch 9 high configuration nodes cluster, Elasticsearch	13	2023	July 6, 2017

Performance Issues

Related topics