I am trying to use the Spark API to write data from a file to an ES index.
My ES Cluster config :
3 data nodes (with HTTP disabled)
2 Client nodes (with no data stored on the nodes).
As I understand, the Spark API works only via the HTTP interface. So when I get my process to connect to the Client Nodes, I get an exception that indicates that process does not have access to any of the shards.
Can I use the Spark API in a scenario like mine where I do not have HTTP access to the data nodes.
If you are using client nodes, you just need to configure ES-Hadoop accordingly - see
Thanks for the brilliant tip I had not looked at all the config
options available (my bad). That said, I was wondering if there was a
reason to go only http and not Transport client ?I mean why not enable the
transport client ?
Transport client is meant for internal use - it might be deprecated in the
future and it is tied to the version of ES being used. Further more it is
REST on the other hand is fully supported, easy to debug/monitor/route,
provides version isolation and in general, is fully available through
existing libraries thus ES-Hadoop itself is quite small.
Further more, having hundreds of clients connecting to ES through REST vs
transport is better (much more scalable) in particular from the perspective
of a node client.