Groovy unicast client cannot connect

Hal_St_Clair · July 2, 2012, 9:27pm

I've set up a test environment with three clustered ElasticSearch servers
and verified that the cluster reports that it is healthy. I'm now trying
to connect to the cluster with a simple test client but I've been
unsuccessful so far.

my servers are at 172.40.6.100, 172.40.6.101, 172.40.6.102 (see attached
cluster.state-pretty.txt for cluster status at the time of the test

the groovy settings closure is:

    nodeBuilder.settings {
        node {
            client = true                    // signifies that this

node is a client
}

        cluster {
            name = 'dlbnaes'
        }

        discovery {
            zen {
                ping {
                    multicast {
                        enabled = false
                    }
                    unicast {
                        hosts = [ "172.40.6.100:9200",

"172.40.6.101:9200", "172.40.6.102:9200" ]
}
}
}
}
}

the log for the master is node is attached and shows the last attempted
connection at [2012-07-02 11:18:34,722] reporting "invalid version format"

The client and all three server nodes are running ElasticSearch version
0.19.7

Please let me know if there is something I'm doing wrong here

Best,

Hal

drewr · July 2, 2012, 9:44pm

Hal St. Clair wrote:

I've set up a test environment with three clustered Elasticsearch
servers and verified that the cluster reports that it is healthy.
I'm now trying to connect to the cluster with a simple test client
but I've been unsuccessful so far.

[...]

                    unicast {
                        hosts = [ "172.40.6.100:9200",

"172.40.6.101:9200", "172.40.6.102:9200" ]
}

[...]

the log for the master is node is attached and shows the last attempted
connection at [2012-07-02 11:18:34,722] reporting "invalid version format"

You want port 9300 or whatever ES reported as the bound_address in
the transport logging. The reason you see "invalid version" is the
internal wire protocol that nodes use to communicate is trying to
talk to the HTTP interface on port 9200.

-Drew

Hal_St_Clair · July 2, 2012, 10:37pm

Thanks for the guidance on this. I've since run the test with all hosts
specifying port 9300 (and again with no port specified at all) but now I
see nothing in the server log and my client still fails with

Exception in thread "main"
org.elasticsearch.cluster.block.ClusterBlockException: blocked by:
[SERVICE_UNAVAILABLE/1/state not recovered /
initialized];[SERVICE_UNAVAILABLE/2/no master];

On Monday, July 2, 2012 4:27:51 PM UTC-5, Hal St. Clair wrote:

I've set up a test environment with three clustered Elasticsearch servers
and verified that the cluster reports that it is healthy. I'm now trying
to connect to the cluster with a simple test client but I've been
unsuccessful so far.

my servers are at 172.40.6.100, 172.40.6.101, 172.40.6.102 (see attached
cluster.state-pretty.txt for cluster status at the time of the test

the groovy settings closure is:
    nodeBuilder.settings {
        node {
            client = true                    // signifies that this 
node is a client
}
        cluster {
            name = 'dlbnaes'
        }

        discovery {
            zen {
                ping {
                    multicast {
                        enabled = false
                    }
                    unicast {
                        hosts = [ "172.40.6.100:9200", "
172.40.6.101:9200", "172.40.6.102:9200" ]
}
}
}
}
}

the log for the master is node is attached and shows the last attempted
connection at [2012-07-02 11:18:34,722] reporting "invalid version format"

The client and all three server nodes are running Elasticsearch version
0.19.7

Please let me know if there is something I'm doing wrong here

Best,

Hal

drewr · July 3, 2012, 4:28am

Hal St. Clair wrote:

Thanks for the guidance on this. I've since run the test with all
hosts specifying port 9300 (and again with no port specified at
all) but now I see nothing in the server log and my client still
fails with

Exception in thread "main"
org.elasticsearch.cluster.block.ClusterBlockException: blocked by:
[SERVICE_UNAVAILABLE/1/state not recovered /
initialized];[SERVICE_UNAVAILABLE/2/no master];

This just means the client cannot connect to one of the data nodes,
not even enough to get a protocol error.

Does the client node have the appropriate access to the data nodes'
port 9300 through firewalls or other obstacles? Also, what's the
output of `curl -s localhost:9200/_cluster/health?pretty=1' on one
of the three data nodes?

-Drew

Hal_St_Clair · July 3, 2012, 1:12pm

I had to open 9200 and 9300 in iptables to get these servers started. I
didn't bother with IP filtering since these are internal test servers. I'm
hitting them from an Ubuntu desktop, which I'm pretty sure has no active
firewall by default.

I've attached a copy of cluster health. Everything looks healthy to me.

On Monday, July 2, 2012 11:28:38 PM UTC-5, Drew Raines wrote:

Hal St. Clair wrote:

Thanks for the guidance on this. I've since run the test with all
hosts specifying port 9300 (and again with no port specified at
all) but now I see nothing in the server log and my client still
fails with

Exception in thread "main"
org.elasticsearch.cluster.block.ClusterBlockException: blocked by:
[SERVICE_UNAVAILABLE/1/state not recovered /
initialized];[SERVICE_UNAVAILABLE/2/no master];

This just means the client cannot connect to one of the data nodes,
not even enough to get a protocol error.

Does the client node have the appropriate access to the data nodes'
port 9300 through firewalls or other obstacles? Also, what's the
output of `curl -s localhost:9200/_cluster/health?pretty=1' on one
of the three data nodes?

-Drew

drewr · July 3, 2012, 2:26pm

Hal St. Clair wrote:

I had to open 9200 and 9300 in iptables to get these servers
started. I didn't bother with IP filtering since these are
internal test servers. I'm hitting them from an Ubuntu desktop,
which I'm pretty sure has no active firewall by default.

From the node running the client, can you ping any of the data nodes?
The point is that the node client has to be able to hit one of the
data nodes, which it is not doing (and why you see errors about no
master).

I've attached a copy of cluster health. Everything looks healthy
to me.

OK. That was just to make sure your cluster name agreed.

-Drew

Hal_St_Clair · July 3, 2012, 3:21pm

I'm able to access the http interface without a problem. More puzzling, I
just ran a wireshark trace on the communications between the client and
server and I can see packets going back and forth between the client and
all three hosts right up until the client reports that it is unable to
connect. Unfortunately, I can make neither heads nor tails of the trace.
It appears that it is simply trying to connect over and over again and as
near as I can tell the response looks valid.

Incidentally, I also tried to connect to only one of the hosts but had no
success with that either.

On Tuesday, July 3, 2012 9:26:42 AM UTC-5, Drew Raines wrote:

Hal St. Clair wrote:

I had to open 9200 and 9300 in iptables to get these servers
started. I didn't bother with IP filtering since these are
internal test servers. I'm hitting them from an Ubuntu desktop,
which I'm pretty sure has no active firewall by default.

From the node running the client, can you ping any of the data nodes?
The point is that the node client has to be able to hit one of the
data nodes, which it is not doing (and why you see errors about no
master).

I've attached a copy of cluster health. Everything looks healthy
to me.

OK. That was just to make sure your cluster name agreed.

-Drew

drewr · July 3, 2012, 4:16pm

Hal St. Clair wrote:

I'm able to access the http interface without a problem.

The node client doesn't talk to the HTTP interface. You need to make
sure 9300 is accessible. How about:

telnet 172.40.6.100 9300

Does that at least connect?

If it does, start the master node with logging.yml rootLogger set to
TRACE and watch the log as you try to connect with the node client.

-Drew

Hal_St_Clair · July 3, 2012, 6:04pm

I probably should have been more specific about the wire trace. I can see
the traffic between the two on 9300. In any event, telnet to port 9300
connects successfully

When I shut down all but one of the nodes and enable trace level logging on
the the remaining node, the log (attached) shows the following repeated
over and over:

[2012-07-02 05:55:30,921][TRACE][transport.netty ] [“dlbnaes01”]
channel opened: [id: 0x2e686cea, /172.40.4.33:48947 => /172.40.6.100:9300]
[2012-07-02 05:55:31,280][DEBUG][river.mongodb ] [“dlbnaes01”]
[mongodb][mongodb] oasis_rest.target last timestamp: { "$ts" : 1341330603 ,
"$inc" : 1}
[2012-07-02 05:55:31,280][DEBUG][river.mongodb ] [“dlbnaes01”]
[mongodb][mongodb] Using filter: { "ts" : { "$gt" : { "$ts" : 1341330603 ,
"$inc" : 1}} , "ns" : { "$regex" : "oasis_rest.target" , "$options" : ""}}
[2012-07-02 05:55:31,873][DEBUG][river.mongodb ] [“dlbnaes01”]
[mongodb][mongodb] oasis_rest.target last timestamp: { "$ts" : 1341330603 ,
"$inc" : 1}
[2012-07-02 05:55:31,873][DEBUG][river.mongodb ] [“dlbnaes01”]
[mongodb][mongodb] Using filter: { "ts" : { "$gt" : { "$ts" : 1341330603 ,
"$inc" : 1}} , "ns" : { "$regex" : "oasis_rest.target" , "$options" : ""}}
[2012-07-02 05:55:32,465][DEBUG][river.mongodb ] [“dlbnaes01”]
[mongodb][mongodb] oasis_rest.target last timestamp: { "$ts" : 1341330603 ,
"$inc" : 1}
[2012-07-02 05:55:32,465][DEBUG][river.mongodb ] [“dlbnaes01”]
[mongodb][mongodb] Using filter: { "ts" : { "$gt" : { "$ts" : 1341330603 ,
"$inc" : 1}} , "ns" : { "$regex" : "oasis_rest.target" , "$options" : ""}}
[2012-07-02 05:55:33,059][DEBUG][river.mongodb ] [“dlbnaes01”]
[mongodb][mongodb] oasis_rest.target last timestamp: { "$ts" : 1341330603 ,
"$inc" : 1}
[2012-07-02 05:55:33,060][DEBUG][river.mongodb ] [“dlbnaes01”]
[mongodb][mongodb] Using filter: { "ts" : { "$gt" : { "$ts" : 1341330603 ,
"$inc" : 1}} , "ns" : { "$regex" : "oasis_rest.target" , "$options" : ""}}
[2012-07-02 05:55:33,654][DEBUG][river.mongodb ] [“dlbnaes01”]
[mongodb][mongodb] oasis_rest.target last timestamp: { "$ts" : 1341330603 ,
"$inc" : 1}
[2012-07-02 05:55:33,654][DEBUG][river.mongodb ] [“dlbnaes01”]
[mongodb][mongodb] Using filter: { "ts" : { "$gt" : { "$ts" : 1341330603 ,
"$inc" : 1}} , "ns" : { "$regex" : "oasis_rest.target" , "$options" : ""}}
[2012-07-02 05:55:33,924][TRACE][transport.netty ] [“dlbnaes01”]
channel closed: [id: 0x2e686cea, /172.40.4.33:48947 :> /172.40.6.100:9300]

for completeness, I removed the mongodb river that I have set up for
testing and re-ran my test but the results are unchanged (apart from the
disappearance of the mongodb lins in the log)

Best,

Hal

On Tuesday, July 3, 2012 11:16:32 AM UTC-5, Drew Raines wrote:

Hal St. Clair wrote:

I'm able to access the http interface without a problem.

The node client doesn't talk to the HTTP interface. You need to make
sure 9300 is accessible. How about:
telnet 172.40.6.100 9300 
Does that at least connect?

If it does, start the master node with logging.yml rootLogger set to
TRACE and watch the log as you try to connect with the node client.

-Drew

drewr · July 3, 2012, 7:24pm

Hal St. Clair wrote:

I probably should have been more specific about the wire trace. I can see
the traffic between the two on 9300. In any event, telnet to port 9300
connects successfully

OK.

[2012-07-02 05:55:30,921][TRACE][transport.netty ] [“dlbnaes01”]
channel opened: [id: 0x2e686cea, /172.40.4.33:48947 => /172.40.6.100:9300]

I'm guessing 172.40.4.33 is the Groovy client. Can you paste more of
the relevant Groovy code somewhere? Maybe there's something subtle
you're missing.

The rest of the settings look ok (although the pretty quotes in the
node names are a little distracting :-)).

-Drew

Hal_St_Clair · July 3, 2012, 8:05pm

That's right 172.40.4.33 is the Ubuntu client. I've attached the code so
you can see it. The application returns from buildElasticSearchNode()
after about thirty seconds but the rest of the code doesn't do much since
the node was never really able to connect to the cluster. The exception is
thrown on line 64.

The code is a little messy since I'm still trying to get it to work

On Tuesday, July 3, 2012 2:24:08 PM UTC-5, Drew Raines wrote:

Hal St. Clair wrote:

I probably should have been more specific about the wire trace. I can
see
the traffic between the two on 9300. In any event, telnet to port 9300
connects successfully

OK.

[2012-07-02 05:55:30,921][TRACE][transport.netty ]
[“dlbnaes01”]
channel opened: [id: 0x2e686cea, /172.40.4.33:48947 => /
172.40.6.100:9300]

I'm guessing 172.40.4.33 is the Groovy client. Can you paste more of
the relevant Groovy code somewhere? Maybe there's something subtle
you're missing.

The rest of the settings look ok (although the pretty quotes in the
node names are a little distracting :-)).

-Drew

drewr · July 3, 2012, 8:38pm

Hal St. Clair wrote:

That's right 172.40.4.33 is the Ubuntu client. I've attached the
code so you can see it. The application returns from
buildElasticSearchNode() after about thirty seconds but the rest of
the code doesn't do much since the node was never really able to
connect to the cluster. The exception is thrown on line 64.

I'm not sure what's going on. It has some kind of communication with
the data node but then no discovery or other activity occurs.

What version of the Groovy client are you using? Data nodes are
0.19.7 right?

-Drew

Hal_St_Clair · July 3, 2012, 10:24pm

Data nodes and client are all 0.19.7

Good to know that it isn't something dead obvious, I guess

On Tuesday, July 3, 2012 3:38:54 PM UTC-5, Drew Raines wrote:

Hal St. Clair wrote:

That's right 172.40.4.33 is the Ubuntu client. I've attached the
code so you can see it. The application returns from
buildElasticSearchNode() after about thirty seconds but the rest of
the code doesn't do much since the node was never really able to
connect to the cluster. The exception is thrown on line 64.

I'm not sure what's going on. It has some kind of communication with
the data node but then no discovery or other activity occurs.

What version of the Groovy client are you using? Data nodes are
0.19.7 right?

-Drew

Hal_St_Clair · July 9, 2012, 5:38pm

For the sake of completeness, I've just upgraded to 19.8 and retested with
the same results.

Has anyone out there done this? Our target environment prohibits multicast
(even though we would firewall it within a private network segment) so I
need to either get this unicast thing working or I have to pull the plug on
my current work with Elasticsearch. I really don't want to abandon it over
this issue.

On Tuesday, July 3, 2012 5:24:48 PM UTC-5, Hal St. Clair wrote:

Data nodes and client are all 0.19.7

Good to know that it isn't something dead obvious, I guess

On Tuesday, July 3, 2012 3:38:54 PM UTC-5, Drew Raines wrote:

Hal St. Clair wrote:

That's right 172.40.4.33 is the Ubuntu client. I've attached the
code so you can see it. The application returns from
buildElasticSearchNode() after about thirty seconds but the rest of
the code doesn't do much since the node was never really able to
connect to the cluster. The exception is thrown on line 64.

I'm not sure what's going on. It has some kind of communication with
the data node but then no discovery or other activity occurs.

What version of the Groovy client are you using? Data nodes are
0.19.7 right?

-Drew

Igor_Motov · July 10, 2012, 1:13am

OK, this is tricky one. Basically, when used in static scope "name" in the
following closure points to the class name and doesn't affect your settings

        cluster {
            name = 'dlbnaes'
        }

As a result, you are creating a client node in the "elasticsearch" cluster
and such node cannot connect to nodes in the "dlbnaes" cluster. Moving this
closure out of static scope or adding cluster name to settings using put
method fixes the issue for me:

    nodeBuilder.settings.put("cluster.name", "dlbnaes")

On Monday, July 9, 2012 1:38:00 PM UTC-4, Hal St. Clair wrote:

For the sake of completeness, I've just upgraded to 19.8 and retested with
the same results.

Has anyone out there done this? Our target environment prohibits
multicast (even though we would firewall it within a private network
segment) so I need to either get this unicast thing working or I have to
pull the plug on my current work with Elasticsearch. I really don't want
to abandon it over this issue.

On Tuesday, July 3, 2012 5:24:48 PM UTC-5, Hal St. Clair wrote:

Data nodes and client are all 0.19.7

Good to know that it isn't something dead obvious, I guess

On Tuesday, July 3, 2012 3:38:54 PM UTC-5, Drew Raines wrote:

Hal St. Clair wrote:

That's right 172.40.4.33 is the Ubuntu client. I've attached the
code so you can see it. The application returns from
buildElasticSearchNode() after about thirty seconds but the rest of
the code doesn't do much since the node was never really able to
connect to the cluster. The exception is thrown on line 64.

I'm not sure what's going on. It has some kind of communication with
the data node but then no discovery or other activity occurs.

What version of the Groovy client are you using? Data nodes are
0.19.7 right?

-Drew