Subject: Connection error "all nodes failed" writing Hive to ES 7.7.1 via Spark UPSERT
Body:
Hi Elastic community,
We're encountering persistent Connection error (check network and/or proxy settings) - all nodes failed
when UPSERTing Hive data to Elasticsearch through Spark. Both clusters are verified operational.
Environment Details:
-
Elasticsearch: v7.7.1 (27 data nodes + 3 dedicated masters)
-
Network: Corporate internal network with domain resolution to 3 master nodes
-
Write Mode: Spark UPSERT operations
-
Data Volume: Batch size = 10k records/write
Current Configuration:
properties
复制
# Connectivity
target.es.nodes.wan.only = true # Access via domain
# Write Tuning
target.es.batch.write.refresh = false
batchSize = 10000
# Spark Resources
ndi.spark.spark-argument.executor-cores = 2
ndi.spark.spark-argument.num-executors = 2
ndi.spark.spark-conf.spark.dynamicAllocation.maxExecutors = 4
Troubleshooting Done:
✓ Validated cluster health (Green status)
✓ Confirmed DNS resolution to master nodes
✓ Tested basic curl connectivity to ES masters
✓ Reduced batch size & executors to limit load
Critical Questions:
-
Domain Configuration:
-
Is
target.es.nodes.wan.only=true
sufficient when using DNS resolution? -
Should we explicitly specify
target.es.nodes
with all master IPs?
-
-
UPSERT-Specific Issues:
-
Could document version conflicts during UPSERT cause node-wide failures?
-
Is additional setup (e.g.
es.mapping.id
) required for UPSERTs vs inserts?
-
-
Node Failure Diagnostics:
-
Where to find connection refusal details in ES 7.7.1 logs?
-
Recommended
net.tcp
settings for heavy batch UPSERTs?
-
-
Proxy Pitfalls:
-
How to verify if outbound proxy interferes with ES-Hadoop?
-
Required
http.proxy*
parameters if internal proxy exists?
-
Thanks for your expertise!