Multiple nodes on elasticsearch


(Ganesh) #1

HI,
My requirement is need to run multiple node on elasticsearch cluster,

I have 3 server's
all the 3 are in cluster form and all the 3 nodes are master.

when i create data node on one server its fail to bind with master..

I'm using docker container to run new elasticsearch on same server. PLease any one help me on this


(David Turner) #2

This is likely to be a networking issue, and the logs will contain helpful clues. Could you share them?


(Ganesh) #3

please my data log error for your reference,

[2018-10-19T08:32:49,850][WARN ][o.e.d.z.ZenDiscovery ] [datanode1-020034.phx.aexp.com] not enough master nodes discovered during pinging (found [[]], but needed [1]), pinging again


(David Turner) #4

That one line on its own is not very useful, but I think there will be more messages than this.


(Ganesh) #5

Sorry for the late reply @DavidTurner

In my log continuously i'm getting below error only,

[2018-10-22T11:36:42,744][INFO ][o.e.d.z.ZenDiscovery ] [datanode1-ac020034.com] failed to send join request to master [{node-ac020035.com}{JI7hjoheRQW0-WDje0lWgg}{HLycvNZbQJe1GXFCtTCv3w}{10.x.xxx.14}{10.x.xxx.14:9300}], reason [RemoteTransportException[[node-ac020035.phx.aexp.com][172.19.0.2:9300][internal:discovery/zen/join]]; nested: ConnectTransportException[[datanode1-ac020034.phx.aexp.com][172.18.0.3:9301] connect_timeout[30s]]; ]


(David Turner) #6

The general form of this message suggests that the master node is failing to connect back to the data node at its advertised address and is timing out after 30 seconds in its attempt to do so, which still suggests a networking misconfiguration.

It's hard to give any more precise feedback since this single line from the log has been post-processed since it was written by Elasticsearch. It's impossible to know which of the strange things in this line are the effects of the post-processing and which are related to your actual problem. For instance, the node names do not match up:

datanode1-ac020034.com vs datanode1-ac020034.phx.aexp.com
node-ac020035.com vs node-ac020035.phx.aexp.com

Also some of the IP addresses are 10.x.x.x and some are 172.x.x.x which is unusual and might or might not be the source of the problem.

Also this line would have been followed by a stack trace that would help to explain the situation in more detail.


(Ganesh) #7

Please find more log data for your reference @DavidTurner

[2018-10-23T09:07:58,261][INFO ][o.e.e.NodeEnvironment    ] [datanode1-lpdosput020034.phx.aexp.com] heap size [990.7mb], compressed ordinary object pointers [true]
[2018-10-23T09:07:58,271][INFO ][o.e.n.Node               ] [datanode1-lpdosput020034.phx.aexp.com] node name [datanode1-lpdosput020034.phx.aexp.com], node ID [gS66sbvASX26ByNmk7fRWQ]
[2018-10-23T09:07:58,272][INFO ][o.e.n.Node               ] [datanode1-lpdosput020034.phx.aexp.com] version[6.2.4], pid[1], build[ccec39f/2018-04-12T20:37:28.497551Z], OS[Linux/3.10.0-862.11.6.el7.x86_64/amd64], JVM[Oracle Corporation/OpenJDK 64-Bit Server VM/1.8.0_171/25.171-b10]
[2018-10-23T09:07:58,272][INFO ][o.e.n.Node               ] [datanode1-lpdosput020034.phx.aexp.com] JVM arguments [-Xms1g, -Xmx1g, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -XX:-OmitStackTraceInFastThrow, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Djava.io.tmpdir=/tmp/elasticsearch.EXryU7po, -XX:+HeapDumpOnOutOfMemoryError, -XX:+PrintGCDetails, -XX:+PrintGCDateStamps, -XX:+PrintTenuringDistribution, -XX:+PrintGCApplicationStoppedTime, -Xloggc:logs/gc.log, -XX:+UseGCLogFileRotation, -XX:NumberOfGCLogFiles=32, -XX:GCLogFileSize=64m, -Des.cgroups.hierarchy.override=/, -Des.path.home=/usr/share/elasticsearch, -Des.path.conf=/usr/share/elasticsearch/config]
[2018-10-23T09:08:00,227][INFO ][o.e.p.PluginsService     ] [datanode1-lpdosput020034.phx.aexp.com] loaded module [aggs-matrix-stats]
[2018-10-23T09:08:00,228][INFO ][o.e.p.PluginsService     ] [datanode1-lpdosput020034.phx.aexp.com] loaded module [analysis-common]
[2018-10-23T09:08:00,228][INFO ][o.e.p.PluginsService     ] [datanode1-lpdosput020034.phx.aexp.com] loaded module [ingest-common]
[2018-10-23T09:08:00,228][INFO ][o.e.p.PluginsService     ] [datanode1-lpdosput020034.phx.aexp.com] loaded module [lang-expression]
[2018-10-23T09:08:00,228][INFO ][o.e.p.PluginsService     ] [datanode1-lpdosput020034.phx.aexp.com] loaded module [lang-mustache]
[2018-10-23T09:08:00,228][INFO ][o.e.p.PluginsService     ] [datanode1-lpdosput020034.phx.aexp.com] loaded module [lang-painless]
[2018-10-23T09:08:00,228][INFO ][o.e.p.PluginsService     ] [datanode1-lpdosput020034.phx.aexp.com] loaded module [mapper-extras]
[2018-10-23T09:08:00,228][INFO ][o.e.p.PluginsService     ] [datanode1-lpdosput020034.phx.aexp.com] loaded module [parent-join]
[2018-10-23T09:08:00,228][INFO ][o.e.p.PluginsService     ] [datanode1-lpdosput020034.phx.aexp.com] loaded module [percolator]
[2018-10-23T09:08:00,228][INFO ][o.e.p.PluginsService     ] [datanode1-lpdosput020034.phx.aexp.com] loaded module [rank-eval]
[2018-10-23T09:08:00,228][INFO ][o.e.p.PluginsService     ] [datanode1-lpdosput020034.phx.aexp.com] loaded module [reindex]
[2018-10-23T09:08:00,228][INFO ][o.e.p.PluginsService     ] [datanode1-lpdosput020034.phx.aexp.com] loaded module [repository-url]
[2018-10-23T09:08:00,228][INFO ][o.e.p.PluginsService     ] [datanode1-lpdosput020034.phx.aexp.com] loaded module [transport-netty4]
[2018-10-23T09:08:00,228][INFO ][o.e.p.PluginsService     ] [datanode1-lpdosput020034.phx.aexp.com] loaded module [tribe]
[2018-10-23T09:08:00,229][INFO ][o.e.p.PluginsService     ] [datanode1-lpdosput020034.phx.aexp.com] loaded plugin [ingest-geoip]
[2018-10-23T09:08:00,229][INFO ][o.e.p.PluginsService     ] [datanode1-lpdosput020034.phx.aexp.com] loaded plugin [ingest-user-agent]
[2018-10-23T09:08:00,229][INFO ][o.e.p.PluginsService     ] [datanode1-lpdosput020034.phx.aexp.com] loaded plugin [x-pack-core]
[2018-10-23T09:08:00,229][INFO ][o.e.p.PluginsService     ] [datanode1-lpdosput020034.phx.aexp.com] loaded plugin [x-pack-deprecation]
[2018-10-23T09:08:00,229][INFO ][o.e.p.PluginsService     ] [datanode1-lpdosput020034.phx.aexp.com] loaded plugin [x-pack-graph]
[2018-10-23T09:08:00,229][INFO ][o.e.p.PluginsService     ] [datanode1-lpdosput020034.phx.aexp.com] loaded plugin [x-pack-logstash]
[2018-10-23T09:08:00,229][INFO ][o.e.p.PluginsService     ] [datanode1-lpdosput020034.phx.aexp.com] loaded plugin [x-pack-ml]
[2018-10-23T09:08:00,229][INFO ][o.e.p.PluginsService     ] [datanode1-lpdosput020034.phx.aexp.com] loaded plugin [x-pack-monitoring]
[2018-10-23T09:08:00,230][INFO ][o.e.p.PluginsService     ] [datanode1-lpdosput020034.phx.aexp.com] loaded plugin [x-pack-security]
[2018-10-23T09:08:00,230][INFO ][o.e.p.PluginsService     ] [datanode1-lpdosput020034.phx.aexp.com] loaded plugin [x-pack-upgrade]
[2018-10-23T09:08:00,230][INFO ][o.e.p.PluginsService     ] [datanode1-lpdosput020034.phx.aexp.com] loaded plugin [x-pack-watcher]
[2018-10-23T09:08:02,532][INFO ][o.e.x.m.j.p.l.CppLogMessageHandler] [controller/119] [Main.cc@128] controller (64 bit): Version 6.2.4 (Build 524e7fe231abc1) Copyright (c) 2018 Elasticsearch BV
[2018-10-23T09:08:03,130][INFO ][o.e.d.DiscoveryModule    ] [datanode1-lpdosput020034.phx.aexp.com] using discovery type [zen]
[2018-10-23T09:08:03,796][INFO ][o.e.n.Node               ] [datanode1-lpdosput020034.phx.aexp.com] initialized
[2018-10-23T09:08:03,796][INFO ][o.e.n.Node               ] [datanode1-lpdosput020034.phx.aexp.com] starting ...
[2018-10-23T09:08:03,927][INFO ][o.e.t.TransportService   ] [datanode1-lpdosput020034.phx.aexp.com] publish_address {172.18.0.3:9301}, bound_addresses {[::]:9301}
[2018-10-23T09:08:03,948][INFO ][o.e.b.BootstrapChecks    ] [datanode1-lpdosput020034.phx.aexp.com] bound or publishing to a non-loopback address, enforcing bootstrap checks
[2018-10-23T09:08:33,973][WARN ][o.e.n.Node               ] [datanode1-lpdosput020034.phx.aexp.com] timed out while waiting for initial discovery state - timeout: 30s
[2018-10-23T09:08:33,980][INFO ][o.e.h.n.Netty4HttpServerTransport] [datanode1-lpdosput020034.phx.aexp.com] publish_address {172.18.0.3:9200}, bound_addresses {[::]:9200}
[2018-10-23T09:08:33,980][INFO ][o.e.n.Node               ] [datanode1-lpdosput020034.phx.aexp.com] started
[2018-10-23T09:08:37,007][INFO ][o.e.d.z.ZenDiscovery     ] [datanode1-lpdosput020034.phx.aexp.com] failed to send join request to master [{node-lpdosput020036.phx.aexp.com}{jDOMBhIiToWXKpIMHtbGEw}{mCS4w9RqR3uPsMVo_Xwm8A}{10.3.237.23}{10.3.237.23:9300}], reason [RemoteTransportException[[node-lpdosput020036.phx.aexp.com][172.20.0.2:9300][internal:discovery/zen/join]]; nested: ConnectTransportException[[datanode1-lpdosput020034.phx.aexp.com][172.18.0.3:9301] connect_timeout[30s]]; ]

(David Turner) #8

Excellent, that makes it much easier to describe what's going on:

This node is claiming to be listening at 172.18.0.3:9301:

[2018-10-23T09:08:03,927][INFO ][o.e.t.TransportService   ] [datanode1-lpdosput020034.phx.aexp.com] publish_address {172.18.0.3:9301}, bound_addresses {[::]:9301}

It wishes to connect to the master at 10.3.237.23:9300:

{node-lpdosput020036.phx.aexp.com}{jDOMBhIiToWXKpIMHtbGEw}{mCS4w9RqR3uPsMVo_Xwm8A}{10.3.237.23}{10.3.237.23:9300}

The master see the connection from the data node arrive via 172.20.0.2:9300:

[node-lpdosput020036.phx.aexp.com][172.20.0.2:9300]

It then tries to connect back to the data node at its publish_address (172.18.0.3:9301) but this times out.

The discrepancy between the 10.x.x.x addresses and the 172.x.x.x addresses seems important. The data node managed to connect to the master at 10.3.237.23 but the master cannot connect back on 172.18.0.3. Perhaps you can fix the routing in your Docker network to allow this to happen, or perhaps the data node should be binding to a 10.x.x.x address instead.


(Ganesh) #9

Thanks for your valuable input,

i'm using docker container and one server contain 1 master and 10 datanode.

Is it possible to define data node with 10.x.x.x:9301 like this instead of 172.x.x.x:9301

when i define data node ip as 10.x.x.x:9301 im getting below issue on log

[2018-10-23T11:28:45,347][INFO ][o.e.n.Node               ] [datanode1-lpdosput020034.phx.aexp.com] JVM arguments [-Xms1g, -Xmx1g, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -XX:-OmitStackTraceInFastThrow, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Djava.io.tmpdir=/tmp/elasticsearch.Q7rxVIRY, -XX:+HeapDumpOnOutOfMemoryError, -XX:+PrintGCDetails, -XX:+PrintGCDateStamps, -XX:+PrintTenuringDistribution, -XX:+PrintGCApplicationStoppedTime, -Xloggc:logs/gc.log, -XX:+UseGCLogFileRotation, -XX:NumberOfGCLogFiles=32, -XX:GCLogFileSize=64m, -Des.cgroups.hierarchy.override=/, -Des.path.home=/usr/share/elasticsearch, -Des.path.conf=/usr/share/elasticsearch/config]
[2018-10-23T11:28:47,371][INFO ][o.e.p.PluginsService     ] [datanode1-lpdosput020034.phx.aexp.com] loaded module [aggs-matrix-stats]
[2018-10-23T11:28:47,371][INFO ][o.e.p.PluginsService     ] [datanode1-lpdosput020034.phx.aexp.com] loaded module [analysis-common]
[2018-10-23T11:28:47,371][INFO ][o.e.p.PluginsService     ] [datanode1-lpdosput020034.phx.aexp.com] loaded module [ingest-common]
[2018-10-23T11:28:47,371][INFO ][o.e.p.PluginsService     ] [datanode1-lpdosput020034.phx.aexp.com] loaded module [lang-expression]
[2018-10-23T11:28:47,372][INFO ][o.e.p.PluginsService     ] [datanode1-lpdosput020034.phx.aexp.com] loaded module [lang-mustache]
[2018-10-23T11:28:47,372][INFO ][o.e.p.PluginsService     ] [datanode1-lpdosput020034.phx.aexp.com] loaded module [lang-painless]
[2018-10-23T11:28:47,372][INFO ][o.e.p.PluginsService     ] [datanode1-lpdosput020034.phx.aexp.com] loaded module [mapper-extras]
[2018-10-23T11:28:47,372][INFO ][o.e.p.PluginsService     ] [datanode1-lpdosput020034.phx.aexp.com] loaded module [parent-join]
[2018-10-23T11:28:47,372][INFO ][o.e.p.PluginsService     ] [datanode1-lpdosput020034.phx.aexp.com] loaded module [percolator]
[2018-10-23T11:28:47,372][INFO ][o.e.p.PluginsService     ] [datanode1-lpdosput020034.phx.aexp.com] loaded module [rank-eval]
[2018-10-23T11:28:47,372][INFO ][o.e.p.PluginsService     ] [datanode1-lpdosput020034.phx.aexp.com] loaded module [reindex]
[2018-10-23T11:28:47,372][INFO ][o.e.p.PluginsService     ] [datanode1-lpdosput020034.phx.aexp.com] loaded module [repository-url]
[2018-10-23T11:28:47,372][INFO ][o.e.p.PluginsService     ] [datanode1-lpdosput020034.phx.aexp.com] loaded module [transport-netty4]
[2018-10-23T11:28:47,372][INFO ][o.e.p.PluginsService     ] [datanode1-lpdosput020034.phx.aexp.com] loaded module [tribe]
[2018-10-23T11:28:47,373][INFO ][o.e.p.PluginsService     ] [datanode1-lpdosput020034.phx.aexp.com] loaded plugin [ingest-geoip]
[2018-10-23T11:28:47,373][INFO ][o.e.p.PluginsService     ] [datanode1-lpdosput020034.phx.aexp.com] loaded plugin [ingest-user-agent]
[2018-10-23T11:28:47,373][INFO ][o.e.p.PluginsService     ] [datanode1-lpdosput020034.phx.aexp.com] loaded plugin [x-pack-core]
[2018-10-23T11:28:47,373][INFO ][o.e.p.PluginsService     ] [datanode1-lpdosput020034.phx.aexp.com] loaded plugin [x-pack-deprecation]
[2018-10-23T11:28:47,373][INFO ][o.e.p.PluginsService     ] [datanode1-lpdosput020034.phx.aexp.com] loaded plugin [x-pack-graph]
[2018-10-23T11:28:47,373][INFO ][o.e.p.PluginsService     ] [datanode1-lpdosput020034.phx.aexp.com] loaded plugin [x-pack-logstash]
[2018-10-23T11:28:47,373][INFO ][o.e.p.PluginsService     ] [datanode1-lpdosput020034.phx.aexp.com] loaded plugin [x-pack-ml]
[2018-10-23T11:28:47,373][INFO ][o.e.p.PluginsService     ] [datanode1-lpdosput020034.phx.aexp.com] loaded plugin [x-pack-monitoring]
[2018-10-23T11:28:47,373][INFO ][o.e.p.PluginsService     ] [datanode1-lpdosput020034.phx.aexp.com] loaded plugin [x-pack-security]
[2018-10-23T11:28:47,373][INFO ][o.e.p.PluginsService     ] [datanode1-lpdosput020034.phx.aexp.com] loaded plugin [x-pack-upgrade]
[2018-10-23T11:28:47,373][INFO ][o.e.p.PluginsService     ] [datanode1-lpdosput020034.phx.aexp.com] loaded plugin [x-pack-watcher]
[2018-10-23T11:28:49,780][INFO ][o.e.x.m.j.p.l.CppLogMessageHandler] [controller/119] [Main.cc@128] controller (64 bit): Version 6.2.4 (Build 524e7fe231abc1) Copyright (c) 2018 Elasticsearch BV
[2018-10-23T11:28:50,434][INFO ][o.e.d.DiscoveryModule    ] [datanode1-lpdosput020034.phx.aexp.com] using discovery type [zen]
[2018-10-23T11:28:51,100][INFO ][o.e.n.Node               ] [datanode1-lpdosput020034.phx.aexp.com] initialized
[2018-10-23T11:28:51,100][INFO ][o.e.n.Node               ] [datanode1-lpdosput020034.phx.aexp.com] starting ...
[2018-10-23T11:28:51,226][INFO ][o.e.t.TransportService   ] [datanode1-lpdosput020034.phx.aexp.com] publish_address {10.3.237.16:9301}, bound_addresses {[::]:9301}
[2018-10-23T11:28:51,248][INFO ][o.e.b.BootstrapChecks    ] [datanode1-lpdosput020034.phx.aexp.com] bound or publishing to a non-loopback address, enforcing bootstrap checks
[2018-10-23T11:28:54,304][INFO ][o.e.d.z.ZenDiscovery     ] [datanode1-lpdosput020034.phx.aexp.com] failed to send join request to master [{node-lpdosput020036.phx.aexp.com}{jDOMBhIiToWXKpIMHtbGEw}{mCS4w9RqR3uPsMVo_Xwm8A}{10.3.237.23}{10.3.237.23:9300}], reason [RemoteTransportException[[node-lpdosput020036.phx.aexp.com][172.20.0.2:9300][internal:discovery/zen/join]]; nested: ConnectTransportException[[datanode1-lpdosput020034.phx.aexp.com][10.3.237.16:9301] connect_exception]; nested: IOException[Connection refused: 10.3.237.16/10.3.237.16:9301]; nested: IOException[Connection refused]; ]
[2018-10-23T11:28:57,378][INFO ][o.e.d.z.ZenDiscovery     ] [datanode1-lpdosput020034.phx.aexp.com] failed to send join request to master [{node-lpdosput020036.phx.aexp.com}{jDOMBhIiToWXKpIMHtbGEw}{mCS4w9RqR3uPsMVo_Xwm8A}{10.3.237.23}{10.3.237.23:9300}], reason [RemoteTransportException[[node-lpdosput020036.phx.aexp.com][172.20.0.2:9300][internal:discovery/zen/join]]; nested: ConnectTransportException[[datanode1-lpdosput020034.phx.aexp.com][10.3.237.16:9301] connect_exception]; nested: IOException[Connection refused: 10.3.237.16/10.3.237.16:9301]; nested: IOException[Connection refused]; ]

(David Turner) #10

This indicates the same basic problem: the master can't connect back to the data node. However it now sees the connection to 10.3.237.16:9301 being actively refused rather than simply timing out. This is still some kind of network configuration issue, but of a different kind.


(Ganesh) #11

@DavidTurner
Following has been resolved and now i can form the cluster data and master.

Thank you so much kind reply's


(system) #12

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.