So some of my devops team went to training in NYC last week. One of the points they brought back was that our client node behavior puts unnecessary strain on the heap usage. How would we actually identify that? Are there configuration options we can use that make sure the client node isn't doing any reductions? Better ways to write the queries?
What's the heap size you are giving to your client nodes? Client nodes need to do a reduce operation after the queries are scattered to the individual shards. This can put memory pressure. Also, are you doing deep paging with queries?
You can check the heap usage using curl -XGET 'http://localhost:9200/_nodes/<client_node_name>/stats/jvm'
I'd go put SPM on your Client nodes and your other nodes and then look at the JVM Memory reports where I'd select your Client node(s) and your other ES nodes and compare GC activity patterns and GC memory pool sizes to see if Client nodes are behaving poorly compared to my other ES nodes. Then based on this you may want to go in and tweak JVM params and such. Once you start tweaking you can use the same reports/view to see if you've managed to improve performance/GC or not.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.