Hello,
I am trying to use elasticsearch-hadoop v5.6.2 with pyspark and I have the following issue. The sample code is very simple just to det the hang of it:
df = spark.read.format('org.elasticsearch.spark.sql').load('index/type')
df.printSchema()
What I get as a result contains the following:
|-- query: struct (nullable = true)
| |-- match_all: struct (nullable = true)
which does not exist in my index mapping leading to an exception being raised. Sometimes having ran a job that contained a filter and trying to run a job after that without filtering adds the filtering query in the schema like this
|-- query: struct (nullable = true)
| |-- bool: struct (nullable = true)
| | |-- filter: struct (nullable = true)
| | | |-- exists: struct (nullable = true)
| | | | |-- field: string (nullable = true)
| | | |-- term: struct (nullable = true)
| | | | |-- http_user: string (nullable = true)
| | |-- must: struct (nullable = true)
| | | |-- match_all: struct (nullable = true)
| |-- match_all: struct (nullable = true)
|-- received_from: string (nullable = true)
Jobs were ran in different indexes both in local mode (--master=local) and in cluster mode using mesos as cluster manager. What could be the issue?