Indexing 10 million documents using Pyspark

I have a ES cluster with 3 dedicated master nodes and 5 dedicated data nodes in my cluster.
When I try to index using pyspark, everything goes fine till the half way mark. Then, I see warnings reported by spark which says:
Cannot detect es version- typically this happens when the cluster is not accessible or if es.nodes.wan.only parameter is incorrect.

I am also passing all the nodes in my cluster in es.nodes parameter.
I have actually set the parameter to true.

What am I missing here?

@praveen_S if you dig into the logs, there should be a corresponding reason why it cannot detect the ES version. This occurs during the initial task start up, so perhaps some of your nodes cannot find a route to the specified Elasticsearch nodes?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.