Hive integration with Elasticsearch show nulls fileds

Hi,

I am trying to map a index in elasticsearch to store the data in hive, but when I query the index via hive shows some fields with null values and some with data.

Here es my configuration:

CREATE external TABLE proxybg_elastic(
logver string,
dtime string,
devid string,
devname string,
vd string,
date timestamp,
date_time timestamp,
logid string,
type_t string,
subtype string,
level string,
srcip string,
srcport string,
srcintf string,
dstip string,
dstport string,
dstinf string,
sessionid string,
poluuid string,
dstcountry string,
srccountry string,
service string,
wanoptapptype string,
proto string,
action string,
duration string,
policyid string,
policytype string,
`user` string,
`group` string,
wanin string,
rcvdbyte bigint,
wanout string,
lanin string,
sentbyte bigint,
lanout string,
appid string,
app string,
appcat string,
apprisk string,
applist string,
utmaction string,
countweb string,
countapp string,
crscore string,
craction string,
geoip struct<continent_name:string,country_iso_code:string,location:map<string,double>>
)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES( 'es.net.http.auth.user'='elastic', 'es.net.http.auth.pass'='Galicia2017',  'es.nodes' = 'hadoopedge02.bancogalicia.com.ar', 'es.resource' = 'proxybg-*','es.field.read.validate.presence' = 'strict');

And here is a sample of what the query result shows:

hdfs@cdh001-e01:~$ hive -e "select * from proxybg.proxybg_elastic limit 100;"
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was removed in 8.0
Java HotSpot(TM) 64-Bit Server VM warning: Using incremental CMS is deprecated and will likely be removed in a future release
17/06/15 10:17:26 WARN mapreduce.TableMapReduceUtil: The hbase-prefix-tree module jar containing PrefixTreeCodec is not present.  Continuing without it.
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was removed in 8.0

Logging initialized using configuration in jar:file:/var/opt/teradata/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/jars/hive-common-1.1.0-cdh5.8.0.jar!/hive-log4j.properties
OK
NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL   NULL     NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL
54      1497339590      FG1K5D3I13800108        FortiCluster01  ids_ips 2017-06-13 00:00:00     2017-06-13 07:39:50     0004000017      traffic sniffer notice  172.17.2.144    50018   "port31"10.0.55.4       3338    "port31"        108364909       NULL    Reserved        Reserved        tcp/3338        NULL    6       accept  1       1       sniffer NULL    NULL    NULL    1776   NULL     NULL    1832    NULL    7       Edonkey P2P     high    "sniffer-profile"       allow   NULL    1       NULL    NULL    NULL
NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL   NULL     NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL
NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL   NULL     NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL
NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL   NULL     NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL
NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL   NULL     NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL
NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL   NULL     NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL
54      1497339598      FG1K5D3I13800108        FortiCluster01  ids_ips 2017-06-13 00:00:00     2017-06-13 07:39:58     0004000017      traffic sniffer notice  190.231.25.34   49253   "port31"172.17.2.71     443     "port31"        88380554        NULL    Reserved        Argentina       HTTPS   NULL    6       accept  257     1       sniffer NULL    NULL    NULL    16855   NULL   NULL     115167  NULL    40568   HTTPS.BROWSER   Web.Client      medium  "sniffer-profile"       allow   NULL    1       NULL    NULL    NULL
54      1497339597      FG1K5D3I13800108        FortiCluster01  ids_ips 2017-06-13 00:00:00     2017-06-13 07:39:57     0004000017      traffic sniffer notice  186.143.133.48  42448   "port31"172.17.2.191    443     "port31"        88391345        NULL    Reserved        Argentina       HTTPS   NULL    6       accept  45      1       sniffer NULL    NULL    NULL    1367    NULL   NULL     6352    NULL    40568   HTTPS.BROWSER   Web.Client      medium  "sniffer-profile"       allow   NULL    1       NULL    NULL    NULL
54      1497339597      FG1K5D3I13800108        FortiCluster01  ids_ips 2017-06-13 00:00:00     2017-06-13 07:39:57     0004000017      traffic sniffer notice  186.141.133.165 10954   "port31"172.17.1.37     443     "port31"        128295713       NULL    Reserved        Argentina       HTTPS   NULL    6       accept  1       1       sniffer NULL    NULL    NULL    979     NULL   NULL     551     NULL    41540   SSL_TLSv1.2     Network.Service medium  "sniffer-profile"       allow   NULL    2       NULL    NULL    NULL

I am missing something?

Please advice,

Regards,

Jonathan.

I see that you are using es.resource with just an index name and wild card. Do the fields show up when you specify the type as well?

Hi James,

Yes I have tried both, set the es.resource with the wild card, and with type as well but for both there are some fields that shows null and some with info.

Here are some possible reasons:
Why some columns are NULL: Hive is case insensitive while Elasticsearch is not. The loss of information can create invalid queries (as the column in Hive might not match the one in Elasticsearch)
Refer https://www.elastic.co/guide/en/elasticsearch/hadoop/current/hive.html#hive-alias

Why whole row is NULL: The row in ES might be some data not in the passed format, like metadata, which is being read as an actual data

Thanks

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.