Problems connecting to ES Cross cluster search cluster indexes from Databricks using spark connector

Hi All,
I am trying to connect with ES from our Databricks cluster using elasticsearch_spark_30_2_12_7_16_3.jar. I'm not able to read the data from cross cluster indexes which are starting with "*:xxxxxxx", However able to read the direct/local indexes from below snippets.
df = (spark.read
.format( "org.elasticsearch.spark.sql" )
.option( "es.nodes", "xxxxxxxxxx")
.option( "es.port", "9200")
.option( "es.net.http.auth.user", user)
.option( "es.net.http.auth.pass", password)
.option( "es.nodes.wan.only", "true" )
.option( "es.net.ssl", "true")
.option( "es.mapping.date.rich", "false")
.option( "es.read.field.include", "xxxxxx")
.load( "*:xxxxxxxxxxx-*" )
)
Kindly suggest me to read the data from Cross cluster indexes starting with "*:xxxxxx"

Do you get an error message in your spark logs?

Hi Keith ,

)Py4JJavaError: An error occurred while calling o635.load. : org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot find mapping for *:logs-xxxxxx-xxx-xxxx-* - one is required before using Spark SQL at org.elasticsearch.spark.sql.SchemaUtils$.discoverMappingAndGeoFields(SchemaUtils.scala:107) at org.elasticsearch.spark.sql.SchemaUtils$.discoverMapping(SchemaUtils.scala:93) at org.elasticsearch.spark.sql.ElasticsearchRelation.lazySchema$lzycompute(DefaultSource.scala:238) at org.elasticsearch.spark.sql.ElasticsearchRelation.lazySchema(DefaultSource.scala:238) at org.elasticsearch.spark.sql.ElasticsearchRelation.$anonfun$schema$1(DefaultSource.scala:242) at scala.Option.getOrElse(Option.scala:189) at org.elasticsearch.spark.sql.ElasticsearchRelation.schema(DefaultSource.scala:242) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:498) at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:375) at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:331) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:331) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:237) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380) at py4j.Gateway.invoke(Gateway.java:306) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:195) at py4j.ClientServerConnection.run(ClientServerConnection.java:115) at java.lang.Thread.run(Thread.java:750)

Unfortunately that is not supported in es-hadoop. See Unable to load the data from ELK using Databricks · Issue #2102 · elastic/elasticsearch-hadoop · GitHub.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.