How to only query on data nodes by elasticsearch-py?

I am using elasticsearch-py, which version is 7.5.1.

I have 3 master nodes and 6 data nodes, master nodes have 2gb heap and data nodes have 30 gb heap.

How can I only query data on my data nodes? I read the source code transport.py in elasticsearch-py, and it will get all nodes by using "/_nodes/_all/http" API.

When query from master nodes, the memory of master nodes will have pressure and i will get circuit_breaking_exception.

How can I query without master nodes? thanks.

Welcome to our community! :smiley:

I don't think the client has the ability to differentiate, it's not explicitly called out in the docs that I could see. Hopefully one of the client team engineers can confirm either way.

I have got the answer on GitHub, for people who has the same question, you can view this link for more information.

Copying the response here as well;

You can modify how the Transport chooses which nodes to use after a sniff by setting host_info_callback :

def only_data_nodes(node_info, host):
    roles = node_info.get("roles", [])
    return host if ("data" in roles and "master" not in roles) else None

es = Elasticsearch(host_info_callback=only_data_nodes, sniff_on_start=True)

The above will only make requests to nodes which don't have the master role but do have the data role. You can modify it how you see fit. Does that answer you question?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.