Tribe node can't connect after installing Shield

*Note that right now I only have 1 cluster up for my tribe nodes to connect to and before installing shield the tribe nodes could connect fine.
After installing shield I created a system key and copied it to all nodes in the cluster then restarted each node in the cluster.

The data/master nodes were able to join the cluster but the tribe nodes cannot.
Multicast is disabled on all nodes and the node lists are correct in elasticsearch.yml.
No shield specific configurations have been set in elasticsearch.yml so all defaults should be in place.
I have not yet set up my keystores for node to node encryption. Only users and roles for HTTP access.

I appreciate any help.

The error I get in the tribe cluster log is as follows:

[2015-07-03 11:25:55,997][WARN ][discovery.zen.ping.unicast] [HOSTNAME/cluster-name] failed to send ping to [[#zen_unicast_2#][HOSTNAME][inet[HOSTNAME_OF_MASTER/IP_OF_MASTER:9300]]]
org.elasticsearch.transport.RemoteTransportException: [HOSTNAME_OF_MASTER][inet[/<IP_OF_MASTER>:9300]][internal:discovery/zen/unicast_gte_1_4]
Caused by: org.elasticsearch.shield.crypto.SignatureException: tampered signed text
at org.elasticsearch.shield.crypto.InternalCryptoService.unsignAndVerify(InternalCryptoService.java:161)
at org.elasticsearch.shield.authc.InternalAuthenticationService.authenticate(InternalAuthenticationService.java:99)
at org.elasticsearch.shield.transport.ServerTransportFilter$NodeProfile.inbound(ServerTransportFilter.java:71)
at org.elasticsearch.shield.transport.ShieldServerTransportService$ProfileSecuredRequestHandler.messageReceived(ShieldServerTransportService.java:171)
at org.elasticsearch.transport.netty.MessageChannelHandler.handleRequest(MessageChannelHandler.java:222)
at org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:114)
at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:296)
at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:74)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at org.elasticsearch.common.netty.handler.ipfilter.IpFilteringHandlerImpl.handleUpstream(IpFilteringHandlerImpl.java:154)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:268)
at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:255)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

It needs to be installed on all nodes, including the Tribe one, did you do that?

Yes - Shield was installed on all nodes and has the same system key on each.

Can you try removing the system key from all nodes and then seeing if everything can connect?

It appears as though the system key is not being recognized on some nodes while it may be recognized on others.

It seems like it is specifically an issue with Tribe nodes. This morning I changed the machines that couldn't connect from tribe nodes to just regular client nodes (http, no master, no data) and they were able to connect fine.

Has anyone seen Tribe nodes work with the Shield version below?

Shield info:

curl --user testuser:XXXXXX 'localhost:9200/_shield'
{
  "status" : "enabled",
  "name" : "hostnamel",
  "cluster_name" : "cluster",
  "version" : {
    "number" : "1.2.1",
    "build_hash" : "f2cc2f1d3d7a0647412917d33a27890a1d958742",
    "build_timestamp" : "2015-04-29T16:46:24Z",
    "build_snapshot" : false
  },
  "tagline" : "You know, for security"
}

jaymode - I'll try as you requested as well.

jaymode,

After removing the system key from all nodes the tribe nodes were able to join the cluster.

It seems somehow the system key is different on the nodes. Could you try to generate a new one and copy that one to all of the nodes?

@jaymode,

The system key is the same on every node. I verified this using md5sum.
Besides - if the system key was different then it shouldn't work when I change from using the tribe node to using a regular client node.

I'll try to reproduce this issue. What version of elasticsearch are you using?

Thanks!

Elasticsearch: Version: 1.6.0, Build: cdd3ac4/2015-06-09T13:36:34Z, JVM: 1.8.0_40

Hi @Mrc0113,

I spent some time today trying to reproduce the issue and was unable to do so; it appears to be working correctly for me. My config is the following:

Elasticsearch version:

{
  "status" : 200,
  "name" : "Access",
  "cluster_name" : "tribe",
  "version" : {
    "number" : "1.6.0",
    "build_hash" : "cdd3ac4dde4f69524ec0a14de3828cb95bbb86d0",
    "build_timestamp" : "2015-06-09T13:36:34Z",
    "build_snapshot" : false,
    "lucene_version" : "4.10.4"
  },
  "tagline" : "You Know, for Search"
} 

Shield version:

{
  "status" : "enabled",
  "name" : "Access",
  "cluster_name" : "tribe",
  "version" : {
    "number" : "1.2.1",
    "build_hash" : "f2cc2f1d3d7a0647412917d33a27890a1d958742",
    "build_timestamp" : "2015-04-29T16:46:24Z",
    "build_snapshot" : false
  },
  "tagline" : "You know, for security"
}

I have two nodes. One is master only and the other is data only (also worked with both as master and data nodes).

Master only config:

cluster.name: "c1"
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["localhost:9300", "localhost:9301"]
transport.tcp.port: 9300
node.master: true
node.data: false

Data only node config:

cluster.name: "c1"
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["localhost:9300", "localhost:9301"]
transport.tcp.port: 9301
node.master: false

Then a single tribe node with the following config:

cluster.name: "tribe"
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: "localhost:9302"
transport.tcp.port: 9302

tribe:
  c1:
    cluster.name: c1
    discovery.zen.ping.multicast.enabled: false
    discovery.zen.ping.unicast.hosts: ["localhost:9300", "localhost:9301"]

Could you try a simple configuration like the above and see if it also works for you with the system key? Any more details on your installation, like if you installed from a RPM or Tar file?

@jaymode.

Interesting. Thanks for trying. I installed via rpm on RHEL 6 w/ Java Oracle 1.8
I'll give it a shot with minimal config at some point over the next few days.

{
"status" : 200,
"name" : "name",
"cluster_name" : "cluster",
"version" : {
"number" : "1.6.0",
"build_hash" : "cdd3ac4dde4f69524ec0a14de3828cb95bbb86d0",
"build_timestamp" : "2015-06-09T13:36:34Z",
"build_snapshot" : false,
"lucene_version" : "4.10.4"
},
"tagline" : "You Know, for Search"
}

The only other thing I can think of is that when the environment isn't configured correctly, the system_key would get placed in /usr/share/elasticsearch/config/shield and not in /etc/elasticsearch/shield. Maybe there is one node like that, but doesn't seem likely based on your previous comments about a node client working.

Had the same problem. 2 nodes were /opt/elasticsearch 1 node was /opt/elastic.

Hmm... need to my cfg act together.