Es response with field name with dots to bind in hive

I tired to bind this response to hive and got failed.

{
    "took":3068,
    "timed_out":false,
    "_shards":{
        "total":348,
        "successful":348,
        "skipped":0,
        "failed":0
    },
    "hits":{
        "total":238,
        "max_score":0,
        "hits":[
            {
                "_id":"QY9QcmIB_VYGYMGMm28T",
                "_score":0,
                "_source":{
                    "nx.request_body":"...",
                    "nx.request_url":"...",
                    "nx.request_length":"..."
                }
            }
        ]
    }
}

enclosing json field has dot separate delimiter.
and here is hive schema definition.

CREATE EXTERNAL TABLE ncp.tmp_nginx_center_photoprint (
    prospector STRUCT <type:STRING>,
    beat STRUCT <
        hostname:STRING,
        name:STRING,
        version:STRING
    >,
    nx.request_body STRING,  // parsing error
    body STRING  // didnt work (bind to NULL)
)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES(
    'es.mapping.names' = 'body:nx.request_body',
);
  1. hive can't work with column name with special characters like . so nx.request_body STRING is failed
  2. tried to map nx.request_body to newly defined name (body) and use it in DDL but binding result is always NULL (maybe value is not bound properly)

Any idea to bind field name with dot into hive table?
I can workaround with renaming all field in ES query like

"script_fields": {
    "request_url": {
      "script": "params['_source']['nx.request_url']"
    },
    "request_body": {
      "script": "params['_source']['nx.request_body']"
    }
}

But It doesn't good nice, (should re-defined all required fields line by line) so Im looking forward to knowing spec of this problem can be solved or not

This is a known issue: https://github.com/elastic/elasticsearch-hadoop/issues/853

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.