jhonsouza
(Jhonatan Souza de Siqueira)
July 2, 2024, 7:40pm
1
Hi, guys!
I'm trying to run the Elasticsearch on AWS ECS. I have a problem at the moment, the nodes try to join the cluster. When the other nodes are in the same instance that the initial master node, they can to join the cluster, but when the other nodes are in a different instance, they are got not join to the cluster. Below the elasticsearch.yml I use.
Initial master node
cluster:
name: '${STACK_NAME}'
initial_master_nodes:
- '${HOSTNAME}-es-initial-master'
routing:
allocation:
awareness:
attributes: aws_availability_zone
node:
name: '${HOSTNAME}-es-initial-master'
roles: [master]
path:
data: /usr/share/elasticsearch/data
logs: /usr/share/elasticsearch/logs
bootstrap:
memory_lock: true
discovery:
zen:
minimum_master_nodes: 2
ec2:
endpoint: ec2.us-east-2.amazonaws.com
network:
host: 0.0.0.0
xpack:
monitoring:
collection:
interval: 10s
enabled: true
security:
enabled: true
cloud:
node:
auto_attributes: true
http:
cors:
enabled: true
allow-origin: "*"
ingest:
geoip:
downloader:
enabled: false
Data node
cluster:
name: '${STACK_NAME}'
node:
name: '${HOSTNAME}-es-data-${SUFFIX}'
roles: ["data"]
path:
data: /usr/share/elasticsearch/data
logs: /usr/share/elasticsearch/logs
bootstrap:
memory_lock: true
discovery:
seed_providers: ec2
ec2:
endpoint: ec2.us-east-2.amazonaws.com
seed_hosts: []
s3:
client:
default:
endpoint: s3.us-east-2.amazonaws.com
network:
host: 0.0.0.0
xpack:
monitoring:
collection:
interval: 10s
enabled: true
security:
enabled: true
cloud:
node:
auto_attributes: true
http:
cors:
enabled: true
allow-origin: "*"
ingest:
geoip:
downloader:
enabled: false
If someone can help me, I appreciate a lot of!!
See these docs for guidance about how to troubleshoot discovery problems, including the things to look for in logs etc. If you need help understanding your logs, please share them here.
jhonsouza
(Jhonatan Souza de Siqueira)
July 3, 2024, 2:02pm
3
Hi David. Thanks for the docs, this will be very helpful. I found this looking at the logs. The node find the eligible master, but can't complete the connection.
Yep that'd do it - these docs are what you need here.
jhonsouza
(Jhonatan Souza de Siqueira)
July 3, 2024, 2:29pm
5
Thanks a lot, David! I'll read this now!
jhonsouza
(Jhonatan Souza de Siqueira)
July 3, 2024, 7:43pm
6
If I configure the network.publish_host
and network.bind_host
with the value: 0.0.0.0
. This is not should resolve my problem?
0.0.0.0
can be fairly trappy for network.publish_host
, especially if there's some kind of proxying or NAT going on as appears to be the case in your environment. I'd recommend being more specific. The log message you shared indicates that one possible step towards a resolution would be to specify network.publish_host: 172.30.5.137
on node deccdfd16c64-es-initial-master
.
jhonsouza
(Jhonatan Souza de Siqueira)
July 4, 2024, 1:39pm
8
Thanks, @DavidTurner !! I changed the network.publish_host: 0.0.0.0
to network.publish_host: _ec2_
. Now I receive a new error
Would you copy the text of the errors (formatted with the </>
button) rather than screenshots? Screenshots are pretty much unreadable here. And include the whole message, not just the few lines you screenshotted, because there's important detailed missing here.
jhonsouza
(Jhonatan Souza de Siqueira)
July 5, 2024, 7:27pm
10
Sorry for delay, here are the logs
{"type": "server", "timestamp": "2024-07-05T18:48:15,080Z", "level": "INFO", "component": "o.e.c.c.JoinHelper", "cluster.name": "dev-es", "node.name": "es-kibana", "message": "failed to join {es-initial-master}{0uGjtOVrQ2qUspiQo9tMLw}{zzUo1K6cRwmdJpkuzBNBAw}{172.30.4.30}{172.30.4.30:9300}{m}{aws_availability_zone=us-east-2b, xpack.installed=true, transform.node=false} with JoinRequest{sourceNode={es-kibana}{2Slv2ZZdRQa1yNV7NPazyw}{XsiGsq4-QVqaMl44QKp8ig}{172.17.0.8}{172.17.0.8:9300}{dir}{aws_availability_zone=us-east-2a, xpack.installed=true, transform.node=false}, minimumTerm=1, optionalJoin=Optional[Join{term=1, lastAcceptedTerm=0, lastAcceptedVersion=0, sourceNode={es-kibana}{2Slv2ZZdRQa1yNV7NPazyw}{XsiGsq4-QVqaMl44QKp8ig}{172.17.0.8}{172.17.0.8:9300}{dir}{aws_availability_zone=us-east-2a, xpack.installed=true, transform.node=false}, targetNode={es-initial-master}{0uGjtOVrQ2qUspiQo9tMLw}{zzUo1K6cRwmdJpkuzBNBAw}{172.30.4.30}{172.30.4.30:9300}{m}{aws_availability_zone=us-east-2b, xpack.installed=true, transform.node=false}}]}",
"stacktrace": ["org.elasticsearch.transport.RemoteTransportException: [es-initial-master][172.17.0.10:9300][internal:cluster/coordination/join]",
"Caused by: org.elasticsearch.transport.ConnectTransportException: [es-kibana][172.17.0.8:9300] handshake failed. unexpected remote node {es-client-18d27a6938af}{oxS_gycZTI6_T_pjkLeXTw}{WZicVUy9SeibYfn0xkuAqg}{172.17.0.8}{172.17.0.8:9300}{r}{aws_availability_zone=us-east-2b, xpack.installed=true, transform.node=false}",
"at org.elasticsearch.transport.TransportService.lambda$connectionValidator$6(TransportService.java:468) ~[elasticsearch-7.17.15.jar:7.17.15]",
"at org.elasticsearch.action.ActionListener$MappedActionListener.onResponse(ActionListener.java:95) ~[elasticsearch-7.17.15.jar:7.17.15]",
"at org.elasticsearch.transport.TransportService.lambda$handshake$9(TransportService.java:577) ~[elasticsearch-7.17.15.jar:7.17.15]",
"at org.elasticsearch.action.ActionListener$DelegatingFailureActionListener.onResponse(ActionListener.java:219) ~[elasticsearch-7.17.15.jar:7.17.15]",
"at org.elasticsearch.action.ActionListenerResponseHandler.handleResponse(ActionListenerResponseHandler.java:43) ~[elasticsearch-7.17.15.jar:7.17.15]",
"at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1471) ~[elasticsearch-7.17.15.jar:7.17.15]",
"at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1471) ~[elasticsearch-7.17.15.jar:7.17.15]",
"at org.elasticsearch.transport.InboundHandler.doHandleResponse(InboundHandler.java:352) ~[elasticsearch-7.17.15.jar:7.17.15]",
"at org.elasticsearch.transport.InboundHandler.lambda$handleResponse$1(InboundHandler.java:340) ~[elasticsearch-7.17.15.jar:7.17.15]",
"at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:718) ~[elasticsearch-7.17.15.jar:7.17.15]",
"at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]",
"at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]",
"at java.lang.Thread.run(Thread.java:1583) [?:?]"] }
Seems that you have two nodes that both claim to be at 172.17.0.8:9300
, that's not going to work. Every node needs its own address.
jhonsouza
(Jhonatan Souza de Siqueira)
July 8, 2024, 8:54pm
13
Hi, @DavidTurner ! Thank you so much for your helping in the troubleshooting. It's working now, I'll document here the solution to helping others there are these problems.
1 Like