Please suggest performance settings for my ElasticSearch cluster

renego · September 11, 2016, 8:53am

Hi, I have troubles with my ElasticSearch cluster and don't know what to do. Here my settings:

Server:
64GB of RAM
8 cores
500GB SSD

Data:
10 different index each 1M rows
10M of rows
11GB of data

Actual ES cluster
3 nodes on the same server
1 master node and 2 slave nodes
8g Heap size for each node
bootstrap.mlockall: true

Search requests:
Search Rate: 150 /s
Search Latency: 1ms
I have a lot of aggregations and filter for each search request

Problem:
When the JVM heap is coming over 90% the nodes not responding anymore. When I restart them, then everything works fine until the next 3 days, where the heap is coming again to 90% and the cluster not respond.

Here is the graph when the heap is > 90%

[Album] Imgur: The magic of the Internet

What I need:

Can someone suggest me settings for my elasticsearch.yml so I can handle in a good way the cache based on the settings above
What to do with the <90% JVM heap problem

Thanks
Nik

Christian_Dahlqvist · September 11, 2016, 9:13am

Why are you running multiple nodes on the server when you can simply run a single node with 30GB of heap?

renego · September 11, 2016, 9:15am

I try it, but the clean of the garbage collection has took longer and there were timeouts when the GC has start to clean.

Christian_Dahlqvist · September 11, 2016, 9:27am

What does your node configuration look like? Do you have any custom settings or are you running with the defaults?

renego · September 11, 2016, 9:31am

This are my settings:

cluster.name: XXXX
node.name: XXX-master
node.data: true
node.master: true

bootstrap.mlockall: true
index.merge.policy.merge_factor: 5

# Bulk pool
threadpool.bulk.type: fixed
threadpool.bulk.size: 1000
threadpool.bulk.queue_size: 30000

# Index pool
threadpool.index.type: fixed
threadpool.index.size: 1000
threadpool.index.queue_size: 10000

#search pool
threadpool.search.queue_size: 4000

index.cache.query.enable: true
index.requests.cache.enable: true
indices.cache.query.size: 25%

indices.fielddata.cache.size: 25%
indices.cache.filter.size: 25%
transport.tcp.compress: true;

#index
index.store.type: mmapfs

network.bind_host: xxx.xxx.xxx.xxx
network.publish_host: xxx.xxx.xxx.xxx
network.host: xxx.xxx.xxx.xxx
discovery.zen.ping.unicast.hosts: ["xxx.xxx.xxx.xxx"]
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.timeout: 10s
transport.tcp.port: 9300
http.port: 9200
http.max_content_length: 500mb
index.routing.allocation.disable_allocation: false

index.search.slowlog.threshold.query.warn: 10s
index.search.slowlog.threshold.query.info: 5s
index.search.slowlog.threshold.query.debug: 2s
index.search.slowlog.threshold.query.trace: 500ms

script.engine.groovy.inline.aggs: on
script.inline: on
script.indexed: on
index.max_result_window: 10000

When the JVM heap is high I start to become in the log files such messages:

[2016-09-11 09:03:54,738][WARN ][transport ] [XXX-master-1] Transport response handler not found of id [16262715]

warkolm · September 11, 2016, 9:41am

No no no no no.
All of this is in memory and having such ridiculously large settings is going to add more heap pressure.

renego · September 11, 2016, 9:46am

To explain the process:

As I say I have 10 index, each one 1M rows. All this index has job ads for 10 different countries. 1 index one county. Every day once a day in different times I rebuild each index. This means I create a new one index for the selected country, fill the data into it and remove the old index.

Should I remove the settings above?

Christian_Dahlqvist · September 11, 2016, 10:16am

Yes. Start with the default settings and then start tweaking from that point if necessary. More is not always better when it comes to these settings and the defaults are generally very good in my experience.

renego · September 11, 2016, 10:47am

Ok I will remove them. What about the cache settings? Are they ok? What do you suggest?

Christian_Dahlqvist · September 11, 2016, 12:52pm

Unless you have reached these through systematic testing and evaluation, I would recommend starting with all default settings. Whether those cache settings are right or not for you use case is impossible for me to tell.

renego · September 11, 2016, 4:32pm

The question ist. I don't know it. Should the JVM heam reach 90% in good optimized node or not? What is a good avg % for the heap ?
Why my cluster die when the JVM heap is > 90% I can not undestand the problem.

Christian_Dahlqvist · September 13, 2016, 3:38pm

Does it reach 90% after you have removed your custom settings?

renego · September 22, 2016, 9:16am

Yes! Every 2 days the JVM heap goes over 90% and the nodes dies.

Some suggestions what to do? Do you need some stats, I can deliver it.

Christian_Dahlqvist · September 22, 2016, 10:52am

Can you please post your full current configuration? What does Marvel show with respect to heap usage over these 2 days before it reaches 90%?

Kim-Kruse-Hansen · September 25, 2016, 5:00pm

I have suffered from the exact same problem. I have trying out various heap sizes, starting from 8 to 12 to 16 to 20 and none was sufficient, Each node has to be restarted every 2 days or so. Extremely long gc old , in the range of 2 minutes or more.

My latest experiment is to maximime heap to 30 GB and cluster is now on day 5. So this has definitely helped , but I am still seeing a small growth in heap usage. So sooner or later , I will probably have to restart the nodes.

I am also monitoring heap usage , if consistently over 90% , a script will restart the node automatically.

renego · September 25, 2016, 6:33pm

HI Kim, this is a very bad solution I don't want to have this stress every day. This is impossible! It is some setting, but we haven't found it.

For example I have another project, where the index is only 2GB big, there are 30-40 reuquest per second and the server and heap ist working fine. Here I have one index only.

But on this problematic project I have 10 indexes , where everyone has 1GB of data. The heap is coming very fast over 90% and the nodes are not responding anymore.

I hope someone from elasticsearch team can help us to solve this problem!

renego · September 25, 2016, 6:36pm

Hi Christian, here is the data:

[Album] Imgur: The magic of the Internet

The settings of the node:

cluster.name: xxxx
node.name: xxxx-master
node.data: true
node.master: true

bootstrap.mlockall: true
index.merge.policy.merge_factor: 5

threadpool.index.queue_size: 10000
index.cache.query.enable: true

transport.tcp.compress: true;
index.store.type: mmapfs

network.bind_host: xxx.xxx.xxx.xxx
network.publish_host: xxx.xxx.xxx.xxx
network.host: xxx.xxx.xxx.xxx
discovery.zen.ping.unicast.hosts: ["xxx.xxx.xxx.xxx"]
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.timeout: 10s
transport.tcp.port: 9300
http.port: 9200
http.max_content_length: 500mb
index.routing.allocation.disable_allocation: false

index.search.slowlog.threshold.query.warn: 10s
index.search.slowlog.threshold.query.info: 5s
index.search.slowlog.threshold.query.debug: 2s
index.search.slowlog.threshold.query.trace: 500ms

script.engine.groovy.inline.aggs: on
script.inline: on
script.indexed: on
index.max_result_window: 10000

As you suggest me, I have removed all the additional nodes and make only one node with 30GB of heap. This doesn't help! I become still after one day the 90% of the heap full and the node die. Not resonding anymore.

I have now 3x more heap space then my index is big.

Suggestion?

Thanks
Nik

warkolm · September 25, 2016, 8:16pm

You look to have too many shards, nearly 200 for only 6GB of data is going to be wasting a lot of resources.

What is in your slow log?

You should remove all of those, they are either pointless or dangerous.

renego · September 25, 2016, 8:55pm

Hi, where do you have seen this 200 ? I have actually 8 shards for each index.

Is the index.store.type: mmapfs not the best choise for an elasticsearch index?

Thanks
Nik

warkolm · September 25, 2016, 9:45pm

It's in the first picture you posted, in Marvel/Monitoring).

Have a read of https://www.elastic.co/guide/en/elasticsearch/reference/2.4/index-modules-store.html#file-system

Topic		Replies	Views
ES used heap % grows slowly until system becomes unresponsive Elasticsearch	21	5594	July 5, 2017
Garbage collection Elasticsearch	13	8307	July 6, 2017
Elasticsearch high load/CPU usage Elasticsearch	10	9579	July 6, 2017
Memory requirements and settings Elasticsearch	8	3030	July 6, 2017
Elasticsearch Memory issue Elasticsearch	10	1695	July 6, 2017

Please suggest performance settings for my ElasticSearch cluster

Related topics