Removing '.' (DOT) character from field name using ES-Hadoop SerDe

gowthamsadasivam · May 4, 2016, 6:20am

I am inserting data from Hive table to Elastic Search using ES-Hadoop SerDe. The data are multiple JSON files containing highly sparse fields. I am using the mapping like "app map<string, string>" for all the JSON root elements. The problem is some times the nested field name contains '.' (DOT) character and it just terminates the entire Hive job since Elastic Search cannot take a field name with a dot character.

For Example:

app {
  "adv.id": "efT3Fg5JnvJVs57IOnc"
}

^ Here the field name "adv.id" contains '.' dot character. The error will be:

Caused by: org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: Found unrecoverable error [127.0.0.1:9200] returned Bad Request(400) - Field name [adv.id] cannot contain '.'; Bailing out..

I have solved the similar problem when I was using "logstash" with a piece of Ruby code to replace all the DOT characters with underscore character (As discussed here)

Is there any option/configuration in ES-Hadoop SerDe to replace DOT character with any other character before pushing into Elastic Search or Just eliminate the fields that contain DOT in field name?

Thanks.

costin · May 10, 2016, 8:59am

No, not at the moment and it is unlikely there will be one.
The DOT restriction has affected a large number of folks and it is not a decision that was taken lightly. Do note that work is underway to improve the situation in ES 5.x as mentioned here.
Since this affects ES 2.x the issue is how to convert the dot - and how to handle reading it back. It's a not a clean situation and one that ES-Hadoop tries to move away, namely to abstract ES.
As a field name is used in various places, hiding the DOT would work potentially only for inserts, in case of scripts for example a user would still have to be aware of it otherwise the field name will not be recognized.
Hence my reluctance in adding some kind of 'translator' that removes the DOT.

Topic		Replies	Views
Es response with field name with dots to bind in hive Elasticsearch es-hadoop	2	1231	May 4, 2018
Spark with elasticsearch hadoop Elasticsearch es-hadoop	2	726	July 4, 2017
What are my options with dots in field names moving from 1.7 to 2.3.5? Elasticsearch	4	1651	July 5, 2017
"name cannot be empty string" error if field name starts with Elasticsearch	4	11910	December 20, 2016
Field name cannot contain '.' Logstash	47	42677	July 6, 2017

Removing '.' (DOT) character from field name using ES-Hadoop SerDe

Related topics