I am inserting data from Hive table to Elastic Search using ES-Hadoop SerDe. The data are multiple JSON files containing highly sparse fields. I am using the mapping like "app map<string, string>
" for all the JSON root elements. The problem is some times the nested field name contains '.' (DOT) character and it just terminates the entire Hive job since Elastic Search cannot take a field name with a dot character.
For Example:
app {
"adv.id": "efT3Fg5JnvJVs57IOnc"
}
^ Here the field name "adv.id" contains '.' dot character. The error will be:
Caused by: org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: Found unrecoverable error [127.0.0.1:9200] returned Bad Request(400) - Field name [adv.id] cannot contain '.'; Bailing out..
I have solved the similar problem when I was using "logstash" with a piece of Ruby code to replace all the DOT characters with underscore character (As discussed here)
Is there any option/configuration in ES-Hadoop SerDe to replace DOT character with any other character before pushing into Elastic Search or Just eliminate the fields that contain DOT in field name?
Thanks.