I am using elasticsearch-py, which version is 7.5.1.
I have 3 master nodes and 6 data nodes, master nodes have 2gb heap and data nodes have 30 gb heap.
How can I only query data on my data nodes? I read the source code transport.py in elasticsearch-py, and it will get all nodes by using "/_nodes/_all/http" API.
When query from master nodes, the memory of master nodes will have pressure and i will get circuit_breaking_exception.
I don't think the client has the ability to differentiate, it's not explicitly called out in the docs that I could see. Hopefully one of the client team engineers can confirm either way.
You can modify how the Transport chooses which nodes to use after a sniff by setting host_info_callback :
def only_data_nodes(node_info, host):
roles = node_info.get("roles", [])
return host if ("data" in roles and "master" not in roles) else None
es = Elasticsearch(host_info_callback=only_data_nodes, sniff_on_start=True)
The above will only make requests to nodes which don't have the master role but do have the data role. You can modify it how you see fit. Does that answer you question?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.