Invalid internal transport message format (ff,f4,ff,fd)

(Vlad Miller) #1

Hello, so I'm trying to configure ES on AWS EB service. I have two nodes and one of the nodes fails with the following message

{"error":{"root_cause":[{"type":"master_not_discovered_exception","reason":"waited for [30s]"}],"type":"master_not_discovered_exception","reason":"waited for [30s]"},"status":503}

When I looked at the error logs I saw

[2015-11-14 09:23:39,501][WARN ][transport.netty ] [718e01a158f7] exception caught on transport layer [[id: 0x801cfa11, / :> /]], closing connection invalid internal transport message format, got (ff,f4,ff,fd)
at org.elasticsearch.transport.netty.SizeHeaderFrameDecoder.decode(
at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(
at org.jboss.netty.handler.codec.frame.FrameDecoder.cleanup(

From the code in you can see that ES expects first two bytes to be equal to ES, however from tcpdump I can see something different > Flags [P.], cksum 0x1ff5 (incorrect -> 0xefbc), seq 1:164, ack 1, win 229, options [nop,nop,TS val 1296347 ecr 1292999], length 163
	0x0000:  4500 00d7 d46e 4000 3f06 4787 0a97 8f3b  E....n@.?.G....;
	0x0010:  0aaa 7aaf 9a9d 2454 4f01 13b7 7d46 d95c  ..z...$TO...}F.\
	0x0020:  8018 00e5 1ff5 0000 0101 080a 0013 c7db  ................
	0x0030:  0013 bac7 4553 0000 009d 0000 0000 0000  ....ES..........
	0x0040:  0183 0000 1e84 e31e 696e 7465 726e 616c  ........internal
	0x0050:  3a64 6973 636f 7665 7279 2f7a 656e 2f75  :discovery/zen/u
	0x0060:  6e69 6361 7374 0000 0000 4100 0000 00b2  nicast....A.....
	0x0070:  d05e 000a 6573 2d73 7461 6769 6e67 0c30  .^
	0x0080:  6534 6535 6461 6461 3463 6216 5436 4f31  e4e5dada4cb.T6O1
	0x0090:  6368 5932 5233 6938 3452 665f 6178 4236  chY2R3i84Rf_axB6
	0x00a0:  4f51 0d32 3535 2e32 3535 2e32 3535 2e30  OQ.
	0x00b0:  0d32 3535 2e32 3535 2e32 3535 2e30 0001  .
	0x00c0:  04ff ffff 0000 0024 5400 e389 7a00 0000  .......$T...z...

As you can see first two bytes are 0x4500 E. and not ES

Both nodes started from Docker container, have equal environment and cluster name is es-staging

How do I fix this problem?


(David Pilato) #2

Are you running the same version on both nodes? Could you check that?

(Vlad Miller) #3

Node 1

elasticsearch@bb0b992ff048:/$ java -version
openjdk version "1.8.0_66-internal"
OpenJDK Runtime Environment (build 1.8.0_66-internal-b17)
OpenJDK 64-Bit Server VM (build 25.66-b17, mixed mode)

  "name" : "bb0b992ff048",
  "cluster_name" : "es-staging",
  "version" : {
    "number" : "2.0.0",
    "build_hash" : "de54438d6af8f9340d50c5c786151783ce7d6be5",
    "build_timestamp" : "2015-10-22T08:09:48Z",
    "build_snapshot" : false,
    "lucene_version" : "5.2.1"
  "tagline" : "You Know, for Search"

Node 2

openjdk version "1.8.0_66-internal"
OpenJDK Runtime Environment (build 1.8.0_66-internal-b17)
OpenJDK 64-Bit Server VM (build 25.66-b17, mixed mode)

  "name" : "3b414043c7df",
  "cluster_name" : "es-staging",
  "version" : {
    "number" : "2.0.0",
    "build_hash" : "de54438d6af8f9340d50c5c786151783ce7d6be5",
    "build_timestamp" : "2015-10-22T08:09:48Z",
    "build_snapshot" : false,
    "lucene_version" : "5.2.1"
  "tagline" : "You Know, for Search"

As I said, both nodes are absolutely identical, because built from same docker image.

(David Pilato) #4

That is strange. Any chance you could run it outside docker context so we make sure docker does not interfere here?
What does your docker file look like?

(Vlad Miller) #5

Dockerfile is pretty straightforward

FROM elasticsearch:2

RUN apt-get update && apt-get upgrade && apt-get install telnet

# RUN mkdir /scripts
ADD config/ /etc/elasticsearch

EXPOSE 9200 9300

USER elasticsearch
ENTRYPOINT ["elasticsearch", "--path.conf=/etc/elasticsearch"]

I tried to run same version from Dockerfile on my local mac computer and everything worked fine. I can try to use EBT java environment and setup es via .ebextensions, and I'm pretty sure this will work, however I wanted to solve this problem using Docker.

Just in case my elasticsearch config

    name: ${CLUSTER_NAME}

plugin.mandatory: cloud-aws
  access_key: XXXXXXXXX
  secret_key: YYYYYYYYY
  region: us-east-1

discovery.type: ec2
discovery.ec2.ping_timeout: 30s
discovery.ec2.tag.Name: ${EC2_TAG_NAME}
discovery.ec2.host_type: private_dns false

discovery.zen.ping_timeout: 30s
  host: [] ${HOSTNAME}

    compression: true

        enabled: true
        allow-origin: '*'

Maybe, if you're more familiar with ES code you can point to the place where ES generates requests to other nodes, so I can try to debug the issue further.

(Vlad Miller) #6

Hey David,

So I created new deployment without Docker and it still fails on EBT. I'll update this message with more information a bit later.

(Vlad Miller) #7

The issue related with Docker setup. When I installed ES directly on EC2 instances nodes joined in cluster without any issues

(Xkidro) #8

@vladmiller did you find a way to do it in docker ? I am also very interested in doing this on Elastic Beanstalk however I failed every time.

(Vlad Miller) #9

Hi @xkidro

No I have not found the way how to do it in docker, however instead of using docker I use java-8 environment and download ES directly to the host machine instead of docker container.

(Xkidro) #10

Using EB extension I guess? ssh-ing to it every time would be weird for an automated environment like Elastic Beanstalk.

If this is true, do you mind sharing it ?

(Vlad Miller) #11

Yes, using ebextensions.

I will publish that on weekends and send you a link. Busy weekdays :slight_smile:

(Xkidro) #12

@vladmiller that's awesome news, thanks!

(Vlad Miller) #13

@xkidro, apologies. Wife demanded my time on weekends :smile:

Here is the url for repo with config
If you want to improve it please do; I am not the best sys ops guy out there

(Xkidro) #14

@vladmiller thanks a lot! I will test this out when I have time, I'm not a good one my self :smiley:

(system) #15