Unable to use es.index.read.missing.as.empty with spark sql

bjet007 · February 17, 2018, 8:40pm

I'm trying to read data with spark in elasticsearch on indexe that could not exist, since my index has a date pattern. Since it's expected in my situation I only want an empty dataset. Is there a way to do it?

I'm using Spark 2.2.1 and elasticsearch 5.6.7 and I did try setting "es.index.read.missing.as.empty" to yes and provide my StructType without any luck.

Here is a sample of my code:

 val elasticSearchSchema = new StructType()
   .add("name", StringType)
   .add("client", StringType)
   .add("timestamp", TimestampType)

val keepOnlySessionWithinPeriod =
  col("timestamp")
    .geq(lit(Timestamp.from(queryStartDate)))
    .and(col("timestamp")
      .lt(lit(Timestamp.from(startPeriod))))

val loadFromElastic = sparkSession
  .read
  .option("es.index.read.missing.as.empty",true)
  .schema(ReadSchema.elasticSearchSchema)
  .format("org.elasticsearch.spark.sql")
  .load(s"my-index-2018-02-01/mytype")
  .filter(keepOnlySessionWithinPeriod)

Thanks

Srinath_C · February 28, 2018, 11:59am

Facing the same issue. The flag "es.index.read.missing.as.empty" doesn't seem to work.
In my case, path has two indices one exists and one doesn't.

james.baiera · March 1, 2018, 10:00pm

I see that you've found it, but this is a known issue: https://github.com/elastic/elasticsearch-hadoop/issues/1055

bjet007 · March 2, 2018, 11:04am

With spark SQL, the behaviour is an exception and not an empty result and in my specific case, i don't query multiple indices. Since my indices are time base, I know how my index are named, I just don't know if it exist.
Right now I'm getting an exception and the spark process terminate. What I'm expecting is to get the empty result so that I can continue my process and re-insert data in elastic like I can do with basic http request with the flags ignore_unavailable and allow_no_indices.

I've been able to dig up to the fact that even if we provide a schema the Datasource classe try to load the mapping. See line DefaultSource.scala.

I did try to fix it, but i had 1 other test failing and getting 1 integration running was nearly impossible. I had to wait 30 minutes+ to test my code and for now I've been able to use my global alias to search my data.

janna · March 9, 2018, 10:51am

any timeline on when this issue will be addressed? Bit of a dealbreaker for us.

system · April 6, 2018, 10:57am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Es.index.read.missing.as.empty seems not working with multiple indices? Elasticsearch es-hadoop	4	2006	July 6, 2017
Elasticsearch-spark-30 read missing field(double type) error Elasticsearch es-hadoop	9	1395	December 31, 2022
Spark - reading index with no type Elasticsearch es-hadoop	1	380	June 9, 2021
Bug when reading if a field has no mapping (empty array by ex.) Elasticsearch es-hadoop	4	1243	April 9, 2017
Issue/Error while reading data from Elastic index with Spark including Custom schema Elasticsearch es-hadoop	2	798	September 8, 2022

Unable to use es.index.read.missing.as.empty with spark sql

Related topics