How to get my java client to discover an entire cluster

Been using ES for a few days now - replacing compass+lucene embedded into
our app toward this distributed model as we rewrite parts of our app to
move into AWS EC2.

I've just got an EC2 instance setup, backed to S3 and finally got the
connection from my local development box to the EC2 instance to work (darn
you client.transport.sniff).

So now I have a question. My java client connects to this EC2 instance
explicitly.

final Client client = new TransportClient(settings)
    .addTransportAddress(new InetSocketTransportAddress(*some.ip.address

*, custom.port));

That is fine when there is one node, but when I bring on additional
instances, how does my client find out about the other instances? Is it
communicated from the ES that I connect to or does my client need to
discover them?

Thanks.

This configuration seems incorrect.

I think I read that giving my client client.transport.sniff=true would
allow the client to discover other nodes.
With true it won't connect at all, with false it doesn't appear to use the
second node.

Could anyone confirm my setup is valid or invalid.

I created an EC2 instance with this config

cluster:
name: es-dev-cluster
node:
master: true

data: true

transport.tcp.port: 9300
http.port: 9200

action.auto_create_index: true
index.mapper.dynamic: false

discovery:
type: ec2

gateway:
type: s3
s3.bucket: es-dev-cluster

cloud:
aws:
access_key: *********
secret_key: *********

cloud.aws.s3.endpoint: s3.amazonaws.com
cloud.aws.ec2.endpoint: ec2.us-east-1.amazonaws.com

I then snapshotted it and created an AMI. I started up two of this AMI. I
assigned one of the two a static IP.

Using http://staticip:9200/_plugin/head/ I can see both of these nodes.

My client connects explicitly to the static

       final Settings settings = ImmutableSettings.settingsBuilder()
                .put("cluster.name", clusterName)
                .put("discovery.type", "ec2")
                .put("node.name", clientRandomNodeName)
                .put("node.master", false)
                .put("node.data", false)
                .put("client.transport.sniff", false) 
                .build();

        final String endpoint = theStaticIp
        final int port = 9300;

        final Client client = new TransportClient(settings)
                .addTransportAddress(new 

InetSocketTransportAddress(endpoint, port));

So now my client is connecting to the static ip and I can index new data. I
shutdown the node with the static ip and expected the second node thats on
to get the work. Instead my client is throwing No Node Found.

I have a couple suspicions as to whats I am doing wrong, but am not sure
what else I should do.

  • Is it valid for both of these EC2 instances to have the given config?
  • Is the method I am using to connect my client to the static ip ES node
    valid?

Thanks, Mike

I got lost at the state you are at, so I will just explain how TransportClient works:

  • When sniff is not enabled, it will just use the list of IPs you provided. You can add several of those to the transport client, and it will just use those to communicate with the cluster. In this case, it will use the IP address you provided in the construction.

  • When sniff is enabled, it will use the address provided to sniff all the nodes in the cluster. It will then use *their publish address" to try and connect to them. So, if the publish address of the node (logged when elasticsearch starts up, or exposed in the nodes info API) is the private IP address, you won't be able to connect to it from the outside world. You can control the publish address and bind address of elasticsearch, the simplest is to just set network.host setting in the config.

On Tuesday, February 7, 2012 at 8:29 PM, MikeNereson wrote:

This configuration seems incorrect.

I think I read that giving my client client.transport.sniff=true would allow the client to discover other nodes.
With true it won't connect at all, with false it doesn't appear to use the second node.

Could anyone confirm my setup is valid or invalid.

I created an EC2 instance with this config

cluster:
name: es-dev-cluster

node:

    master: true

data: true

transport.tcp.port: 9300
http.port: 9200

action.auto_create_index: true
index.mapper.dynamic: false

discovery:
type: ec2

gateway:
type: s3

    s3.bucket: es-dev-cluster

cloud:
aws:

            access_key: *********

            secret_key: *********

cloud.aws.s3.endpoint: s3.amazonaws.com (http://s3.amazonaws.com)
cloud.aws.ec2.endpoint: ec2.us-east-1.amazonaws.com (http://ec2.us-east-1.amazonaws.com)

I then snapshotted it and created an AMI. I started up two of this AMI. I assigned one of the two a static IP.

Using http://staticip:9200/_plugin/head/ I can see both of these nodes.

My client connects explicitly to the static

       final Settings settings = ImmutableSettings.settingsBuilder()
                .put("cluster.name (http://cluster.name)", clusterName)

                .put("discovery.type", "ec2")

                .put("node.name (http://node.name)", clientRandomNodeName)

                .put("node.master", false)

                .put("node.data", false)

                .put("client.transport.sniff", false) 

                .build();


        final String endpoint = theStaticIp
        final int port = 9300;


        final Client client = new TransportClient(settings)
                .addTransportAddress(new InetSocketTransportAddress(endpoint, port));

So now my client is connecting to the static ip and I can index new data. I shutdown the node with the static ip and expected the second node thats on to get the work. Instead my client is throwing No Node Found.

I have a couple suspicions as to whats I am doing wrong, but am not sure what else I should do.

  • Is it valid for both of these EC2 instances to have the given config?
  • Is the method I am using to connect my client to the static ip ES node valid?

Thanks, Mike

This is relevant and specifically answers my original question.

http://elasticsearch-users.115913.n3.nabble.com/NoNodeAvailableException-No-node-available-on-Elastic-Beanstalk-tp3550240p3551676.html

"sniff will not go and find the cluster automatically. You need to add the

address of at least one node in the cluster to the transport client. What
sniff will do when enabled, is that it will get from that node the rest of
the nodes in the cluster, and use all of those when round robin requests.
" -- kimchy

Just to be clear regarding the "outside world ":

So, if the publish address of the node (logged when elasticsearch starts
up, or exposed in the nodes info API) is the private IP address, you won't
be able to connect to it from the outside world

(1) If the publish address is the EC2 instance's private IP address, it
will only be accessible from other EC2 nodes. Presumable they need to be in
the same AWS region or even in the same availability zone. So my elastic
bean-stalked applications should be able to access it.

(2) Networks external to AWS, my dev box, can not access it.