Invalid internal transport message format (ff,f4,ff,fd)

Hello, so I'm trying to configure ES on AWS EB service. I have two nodes and one of the nodes fails with the following message

{"error":{"root_cause":[{"type":"master_not_discovered_exception","reason":"waited for [30s]"}],"type":"master_not_discovered_exception","reason":"waited for [30s]"},"status":503}

When I looked at the error logs I saw

[2015-11-14 09:23:39,501][WARN ][transport.netty ] [718e01a158f7] exception caught on transport layer [[id: 0x801cfa11, /10.170.122.175:44259 :> /172.17.0.5:9300]], closing connection
java.io.StreamCorruptedException: invalid internal transport message format, got (ff,f4,ff,fd)
at org.elasticsearch.transport.netty.SizeHeaderFrameDecoder.decode(SizeHeaderFrameDecoder.java:64)
at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:425)
at org.jboss.netty.handler.codec.frame.FrameDecoder.cleanup(FrameDecoder.java:482)
.......

From the code in SizeHeaderFrameDecoder.java you can see that ES expects first two bytes to be equal to ES, however from tcpdump I can see something different

10.151.143.59.39581 > 10.170.122.175.9300: Flags [P.], cksum 0x1ff5 (incorrect -> 0xefbc), seq 1:164, ack 1, win 229, options [nop,nop,TS val 1296347 ecr 1292999], length 163
	0x0000:  4500 00d7 d46e 4000 3f06 4787 0a97 8f3b  E....n@.?.G....;
	0x0010:  0aaa 7aaf 9a9d 2454 4f01 13b7 7d46 d95c  ..z...$TO...}F.\
	0x0020:  8018 00e5 1ff5 0000 0101 080a 0013 c7db  ................
	0x0030:  0013 bac7 4553 0000 009d 0000 0000 0000  ....ES..........
	0x0040:  0183 0000 1e84 e31e 696e 7465 726e 616c  ........internal
	0x0050:  3a64 6973 636f 7665 7279 2f7a 656e 2f75  :discovery/zen/u
	0x0060:  6e69 6361 7374 0000 0000 4100 0000 00b2  nicast....A.....
	0x0070:  d05e 000a 6573 2d73 7461 6769 6e67 0c30  .^..es-staging.0
	0x0080:  6534 6535 6461 6461 3463 6216 5436 4f31  e4e5dada4cb.T6O1
	0x0090:  6368 5932 5233 6938 3452 665f 6178 4236  chY2R3i84Rf_axB6
	0x00a0:  4f51 0d32 3535 2e32 3535 2e32 3535 2e30  OQ.255.255.255.0
	0x00b0:  0d32 3535 2e32 3535 2e32 3535 2e30 0001  .255.255.255.0..
	0x00c0:  04ff ffff 0000 0024 5400 e389 7a00 0000  .......$T...z...

As you can see first two bytes are 0x4500 E. and not ES

Both nodes started from Docker container, have equal environment and cluster name is es-staging

How do I fix this problem?

Thanks

2 Likes

Are you running the same version on both nodes? Could you check that?

Node 1

elasticsearch@bb0b992ff048:/$ java -version
openjdk version "1.8.0_66-internal"
OpenJDK Runtime Environment (build 1.8.0_66-internal-b17)
OpenJDK 64-Bit Server VM (build 25.66-b17, mixed mode)

{
  "name" : "bb0b992ff048",
  "cluster_name" : "es-staging",
  "version" : {
    "number" : "2.0.0",
    "build_hash" : "de54438d6af8f9340d50c5c786151783ce7d6be5",
    "build_timestamp" : "2015-10-22T08:09:48Z",
    "build_snapshot" : false,
    "lucene_version" : "5.2.1"
  },
  "tagline" : "You Know, for Search"
}

Node 2

openjdk version "1.8.0_66-internal"
OpenJDK Runtime Environment (build 1.8.0_66-internal-b17)
OpenJDK 64-Bit Server VM (build 25.66-b17, mixed mode)

{
  "name" : "3b414043c7df",
  "cluster_name" : "es-staging",
  "version" : {
    "number" : "2.0.0",
    "build_hash" : "de54438d6af8f9340d50c5c786151783ce7d6be5",
    "build_timestamp" : "2015-10-22T08:09:48Z",
    "build_snapshot" : false,
    "lucene_version" : "5.2.1"
  },
  "tagline" : "You Know, for Search"
}

As I said, both nodes are absolutely identical, because built from same docker image.

That is strange. Any chance you could run it outside docker context so we make sure docker does not interfere here?
What does your docker file look like?

Dockerfile is pretty straightforward

FROM elasticsearch:2

RUN apt-get update && apt-get upgrade && apt-get install telnet

# RUN mkdir /scripts
ADD config/ /etc/elasticsearch

EXPOSE 9200 9300

USER elasticsearch
ENTRYPOINT ["elasticsearch", "--path.conf=/etc/elasticsearch"]

I tried to run same version from Dockerfile on my local mac computer and everything worked fine. I can try to use EBT java environment and setup es via .ebextensions, and I'm pretty sure this will work, however I wanted to solve this problem using Docker.

Just in case my elasticsearch config

cluster:
    name: ${CLUSTER_NAME}

plugin.mandatory: cloud-aws

cloud.aws:
  access_key: XXXXXXXXX
  secret_key: YYYYYYYYY
  region: us-east-1

discovery.type: ec2
discovery.ec2.ping_timeout: 30s
discovery.ec2.tag.Name: ${EC2_TAG_NAME}
discovery.ec2.host_type: private_dns
discovery.zen.ping.multicast.enabled: false

discovery.zen.ping_timeout: 30s
discovery.zen.ping.unicast:
  host: []

network.host: ${HOSTNAME}
network.publish_host: 255.255.255.0

http:
    host: 0.0.0.0
    compression: true

    cors:
        enabled: true
        allow-origin: '*'

Maybe, if you're more familiar with ES code you can point to the place where ES generates requests to other nodes, so I can try to debug the issue further.

Hey David,

So I created new deployment without Docker and it still fails on EBT. I'll update this message with more information a bit later.

The issue related with Docker setup. When I installed ES directly on EC2 instances nodes joined in cluster without any issues

@vladmiller did you find a way to do it in docker ? I am also very interested in doing this on Elastic Beanstalk however I failed every time.

Hi @xkidro

No I have not found the way how to do it in docker, however instead of using docker I use java-8 environment and download ES directly to the host machine instead of docker container.

Using EB extension I guess? ssh-ing to it every time would be weird for an automated environment like Elastic Beanstalk.

If this is true, do you mind sharing it ?

Yes, using ebextensions.

I will publish that on weekends and send you a link. Busy weekdays :slight_smile:

@vladmiller that's awesome news, thanks!

@xkidro, apologies. Wife demanded my time on weekends :smile:

Here is the url for repo with config https://github.com/vladmiller/elasticsearch-beanstalk
If you want to improve it please do; I am not the best sys ops guy out there

@vladmiller thanks a lot! I will test this out when I have time, I'm not a good one my self :smiley: