How to specify which analyzed elasticsearch field is to be used via SQl query

(Robbie) #1

My ES mapping has each field analyzed in a few different ways. Can I control which of the analyzed field will be used when I make a spark-sql query via the ES connector? Currently, it looks like it uses the non-analyzed version if "strict" is true, or the analyzed version otherwise.

For instance, below is one of the fields that I have in my ES index
"name": { "type": "string", "analyzer": "custom_pattern_index", "fields": { "nonanalyzed": { "type": "string", "analyzer": "custom_non_analyzer" }, "whitespace": { "type": "string", "analyzer": "custom_whitespace_analyzer" } } }

In such a case, how do I ensure that my SQL query/ DataFrame.filter gets fired against name.whitespace instead of name?


(Costin Leau) #2

There isn't any option yet in ES-Spark to change the field name to a different one. It shouldn't be hard to do however the question remains whether to do it globally (to apply the new field for filters used in a Spark context) or per Data Frame?
Also outside not analysed fields, I'm wondering whether is there anything else to have one configuration to include all the options.

(system) #3