Hi, everybody I`m stuck on cluster performance tuning/scale and need help.
I'm implementing solution on top of Elasticsearch. Now I`ve started load testing and 5 parallel thread requests put my cluster down.
In short, I have this configuration:
1 node - all roles, 2x6 cores, 64 GB RAM.
ES_HEAP_SIZE=30g
bootstrap.mlockall: true
indices.fielddata.cache.size: 20%
network.tcp.blocking: true
Others by default
My main index is now 5+ million documents and about 80 GB.
For the future expansion, it`s laid out for 12 shards.
The basic query is quite heavy. It filters nothing but 2 types (now is only 2 of them) but runs several (5-7) aggregations on the whole set of documents. With 1 thread query time is acceptable, about 350-700 mils. But in a multithreaded test mode CPU immediately flies up to 100%
In hot_thread i see
100.1% (500.4ms out of 500ms) cpu usage by thread 'elasticsearch[node-4][search][T#9]
94.7% (473.6ms out of 500ms) cpu usage by thread 'elasticsearch[node-4][search][T#15]
92.8% (463.9ms out of 500ms) cpu usage by thread 'elasticsearch[node-4][search][T#25]
(Can provide more details if needed)
And even EsRejectedExecutionException in els.log
If I profile the query, I see that most of the time (and apparently CPU) costs goes for the aggregations.
"took": 462,
.....
"query": [
{
"query_type": "ConstantScoreQuery",
"lucene": "ConstantScore((ConstantScore(_type:bidutp) ConstantScore(_type:prgos))~1)",
"time": "81.26062800ms",
.....
"name": "MultiCollector",
"reason": "search_multi",
"time": "348.6118540ms",
(Can post full if needed)
So now we have came to questions.
What am I doing wrong?
Is it the meter of shards count, or i should reduce the heap, or monitor GC?
Maybe investigate some more?
I will for sure add some nodes to my cluster (2-3, i don't have tons of them in my pocket) , but I need to understand whether this will be enough.
Will be grateful for any advice to help