Elasticsearch-hadoop inserting query as field

Hello,
I am trying to use elasticsearch-hadoop v5.6.2 with pyspark and I have the following issue. The sample code is very simple just to det the hang of it:

df = spark.read.format('org.elasticsearch.spark.sql').load('index/type')
df.printSchema()

What I get as a result contains the following:

|-- query: struct (nullable = true)
|    |-- match_all: struct (nullable = true)

which does not exist in my index mapping leading to an exception being raised. Sometimes having ran a job that contained a filter and trying to run a job after that without filtering adds the filtering query in the schema like this

|-- query: struct (nullable = true)
 |    |-- bool: struct (nullable = true)
 |    |    |-- filter: struct (nullable = true)
 |    |    |    |-- exists: struct (nullable = true)
 |    |    |    |    |-- field: string (nullable = true)
 |    |    |    |-- term: struct (nullable = true)
 |    |    |    |    |-- http_user: string (nullable = true)
 |    |    |-- must: struct (nullable = true)
 |    |    |    |-- match_all: struct (nullable = true)
 |    |-- match_all: struct (nullable = true)
 |-- received_from: string (nullable = true)

Jobs were ran in different indexes both in local mode (--master=local) and in cluster mode using mesos as cluster manager. What could be the issue?

1 Like

If you configure your index with dynamic mapping disabled, then any call that tries to add new fields to the mapping without calling the mapping end point directly should fail. It's possible that something is sending a query as part of an index request instead of a search request.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.