Hive read operation fails when stored as external table pointing to Elastic search location


(Vighnesh) #1

Hi,

I have 5.10.2 CDH Kerberos enabled 6 node Hadoop cluster and I have installed Elasticsearch 6.1.4 on Edge-node (Single node ES).
I am trying to to load data from hive to Elasticsearch and read it in hive.

for ex: I have a table customer (a managed hive table). I am creating an external table like below:
CREATE EXTERNAL TABLE customer_es(
cust_id string,
polcy_policy_num string,
prod_plan_cd string,
comp_cd int,
prod_nm string,
polcy_pol_status int,
rider_prm int,
gentype_cd string,
prm_pymt_mde_id int,
cust_occupation string,
marstatus_id string,
cust_no_of_dpnds int,
polcycov_cvrg_amt double,
custaddrs_add_line1 string,
custaddrs_add_line2 int,
custaddrs_add_line3 string,
city_cd string,
state_cd string,
zip_cd int,
tapestry_segment string,
propensity_score int,
rank1_product string,
rank2_product string,
rank3_product string,
rank4_product string,
rank5_product string,
hh_income int,
age int,
estimatedinforce_anp double,
propensity_threshold string)
ROW FORMAT SERDE 'org.elasticsearch.hadoop.hive.EsSerDe'
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES ('es.resource' = 'customer/custdata','es.nodes'='ab-edge-node1:9200','es.index.auto.create'='true','es.mapping.id'='cust_id');

Once this table is created, I will load the data from hive to ES as below:

insert overwrite table customer_es select * from customer ;

logs:

Query ID = hduser_20180501151212_1e2da089-7411-48fb-beb8-2498c043ef21
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1524755329957_0130, Tracking URL = http://ab-master-node2:8088/proxy/application_1524755329957_0130/
Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1524755329957_0130
Hadoop job information for Stage-0: number of mappers: 1; number of reducers: 0
2018-05-01 15:12:33,113 Stage-0 map = 0%, reduce = 0%
2018-05-01 15:12:45,737 Stage-0 map = 100%, reduce = 0%, Cumulative CPU 1.88 sec
MapReduce Total cumulative CPU time: 1 seconds 880 msec
Ended Job = job_1524755329957_0130
MapReduce Jobs Launched:
Stage-Stage-0: Map: 1 Cumulative CPU: 1.88 sec HDFS Read: 12024 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 880 msec
OK
Time taken: 34.634 seconds

In the above log HDFS read i see some no. of bytes read, but write is zero. However when i curl to my elasticsearch node and see, a new index (customer) with data is created.

Now the issue is when i try to do : Select * from customer_es; I get below error

Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.io.IntWritable

Need some help to resolve this issue.


(James Baiera) #2

Can you provide some mappings and test data to reproduce the error? My guess is that you have a field that is mapped as a Long in Elasticsearch, but is defined as an Int in your Hive DDL.


(Vighnesh) #3

Hi James,

Thanks for ur reply.. PFB the sample data
01 CLAUDIA-MCRAE-11161960-531382437,ATR78038T,F,11,FLEX-PREM-ANNUITY,99,302,M,0,NA,NA,0,15077.87,4120 HELENS POUROFF AVE,0,,FREDERICK,MD,89085,?��,1,Product1,Product4,Product5,Product7," Product6",0,36,1200.52,High propensity
01 ILEEN-MCCALLISTER-11181970-762786023,ATR125570T,F,10,FLEX-PREM-ANNUITY,49,906,M,0,NA,Married,0,0,4421 OASIS HILL AVE,0,,ROSEDALE,MD,89085,?��,1,Product2,Product5,Product1,Product7," Product6",0,41,1204.52,High propensity

More interestingly i created one more table with just 2 rows and 3 columns. insert overwrite happens and data is available in ES. But when i query the table neither i get output nor error.

And I have not mapped explicitly any columns other than cust_id in DDL of external table. Can you please help me with the mappings if you see that as a problem?
Thanks in advance !!!


(Vighnesh) #4

Hi James,

PFB the mappings.

"properties": {
"age": {
"type": "long"
},
"city_cd": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"comp_cd": {
"type": "long"
},
"cust_id": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"cust_no_of_dpnds": {
"type": "long"
},
"cust_occupation": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"custaddrs_add_line1": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"custaddrs_add_line2": {
"type": "long"
},
"custaddrs_add_line3": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"estimatedinforce_anp": {
"type": "float"
},
"gentype_cd": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"hh_income": {
"type": "long"
},
"marstatus_id": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"polcy_pol_status": {
"type": "long"
},
"polcy_policy_num": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"polcycov_cvrg_amt": {
"type": "float"
},
"prm_pymt_mde_id": {
"type": "long"
},
"prod_nm": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"prod_plan_cd": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"propensity_score": {
"type": "long"
},
"propensity_threshold": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"rank1_product": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"rank2_product": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"rank3_product": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"rank4_product": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"rank5_product": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"rider_prm": {
"type": "long"
},

Due to charters limitation.. removed few mappings.. 
   And I see, as you rightly pointed out some of the columns are mapped to long in ES and they are int in hive.. It would be great if you can help me with hive DDL to map it correctly to ES.

(system) #5

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.