WARN ScalaRowValueReader: Field 'hits.hits._score' is backed by an array but the associated Spark Schema does not reflect this


(Madhur Chopra) #1

i am trying to access _id field from metadata

@udf('string')
def get_id(metadata):
    return metadata['_id']

def get_index(server,index_name,query):
    df = sqlContext.read.format("org.elasticsearch.spark.sql")\
            .option("es.nodes",server)\
            .option("es.query",query)\
            .option("es.read.metadata","true")\
            .option("es.read.metadata.field","metadata")\
            .option("es.read.field.empty.as.null","yes")\
            .option("es.index.read.missing.as.empty","yes")\
            .option("es.read.field.validate.presence","ignore")\
            .load(index_name)
            
    return df.withColumn('Id',get_id(df.metadata)) 

query = """{"query":{ "match_all": {}}}"""

The strange part is when i call "get_index()" and on returned dataframe, I do .select('Id') i see this warning, but when i select all the fields, i do not see this warning.

i have tried adding es.read.field.as.array.include hits.hits._score but the warning doesnt seem to go away.

Thanks in advance for help


(system) #2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.