Env:
using this jar: elasticsearch-spark-13_2.10-5.1.1.jar
sqlContext.read.format("org.elasticsearch.spark.sql").option("es.nodes",es_url).option("es.port", "443").option("es.nodes.wan.only", "true").option("es.net.ssl", "true").option("es.read.field.as.array.include",array_with_comma).option("es.mapping.date.rich","false").option("es.read.field.exclude", exclude_with_comma).option("es.read.field.include", "").option("pushdown", "true").load(es_index)
not passing any args except for exclude fields in which case we exclude a couple from top level.
The problem we have is missing fields in the dataframe.printSchema()...
Does it use _mapping to figure out the schema or sampling? I didn't find any docs on this.
Thanks.