I am new to Elastic Search and writing a Java code to filter some data based on exact match. I am using below code to open a connection to ES cluster.
Client client = TransportClient.builder().build()
.addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("host1"), 9300))
.addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("host2"), 9300));
If I have 10 nodes in the cluster, how many I should configure to TransportClient through addTransportAddress method. All 10 or just all master nodes or all data nodes ? Please explain implications of doing just 1 or more. Thanks in advance.
If you add the option
client.transport.sniff: true, you TransportClient will be able to sniff the rest of the cluster.
It means that the client will be able to send the traffic to any of those nodes.
Without this option, the client will send traffic only to the nodes you defined. Which is super useful if for example you want to add some client nodes in front of your elasticsearch cluster.
Thanks for your reply.
I am not able to understand which one is beneficial, to let my client sniff or let it talk with specific nodes ? It will be great if you can throw some light on that.
The only benefit I can see by adding all nodes (or sniffing) is that the client calls will be made on all nodes so basically it would distribute the load.
It depends on you cluster size basically.
If you have a cluster with 3 nodes, I would just add all the 3 nodes on the TransportClient.
If you have a cluster of 20 data nodes + 3 master nodes + 2 client nodes, I'd send the traffic only to the 2 client nodes.
What do you have?
I am yet to decide the sizing of my production cluster. But for my dev instance I have only 3 nodes.
I never use more than one node when developing.
I too have all 3 on same machine running on different ports. Just wanted to see the effect of multinodes if any on API calls & performance etc.
Performance, architecture are questions not related to API usage.
When developing, you should consider that you are sending requests to a cluster whatever its size is.
If you want to run performance tests, do that in real conditions. It does not make sense otherwise, especially if you are sharing the same hardware.