Hi,
I have 5.10.2 CDH Kerberos enabled 6 node Hadoop cluster and I have installed Elasticsearch 6.1.4 on Edge-node (Single node ES).
I am trying to to load data from hive to Elasticsearch and read it in hive.
for ex: I have a table customer (a managed hive table). I am creating an external table like below:
CREATE EXTERNAL TABLE customer_es(
cust_id string,
polcy_policy_num string,
prod_plan_cd string,
comp_cd int,
prod_nm string,
polcy_pol_status int,
rider_prm int,
gentype_cd string,
prm_pymt_mde_id int,
cust_occupation string,
marstatus_id string,
cust_no_of_dpnds int,
polcycov_cvrg_amt double,
custaddrs_add_line1 string,
custaddrs_add_line2 int,
custaddrs_add_line3 string,
city_cd string,
state_cd string,
zip_cd int,
tapestry_segment string,
propensity_score int,
rank1_product string,
rank2_product string,
rank3_product string,
rank4_product string,
rank5_product string,
hh_income int,
age int,
estimatedinforce_anp double,
propensity_threshold string)
ROW FORMAT SERDE 'org.elasticsearch.hadoop.hive.EsSerDe'
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES ('es.resource' = 'customer/custdata','es.nodes'='ab-edge-node1:9200','es.index.auto.create'='true','es.mapping.id'='cust_id');
Once this table is created, I will load the data from hive to ES as below:
insert overwrite table customer_es select * from customer ;
logs:
Query ID = hduser_20180501151212_1e2da089-7411-48fb-beb8-2498c043ef21
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1524755329957_0130, Tracking URL = http://ab-master-node2:8088/proxy/application_1524755329957_0130/
Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1524755329957_0130
Hadoop job information for Stage-0: number of mappers: 1; number of reducers: 0
2018-05-01 15:12:33,113 Stage-0 map = 0%, reduce = 0%
2018-05-01 15:12:45,737 Stage-0 map = 100%, reduce = 0%, Cumulative CPU 1.88 sec
MapReduce Total cumulative CPU time: 1 seconds 880 msec
Ended Job = job_1524755329957_0130
MapReduce Jobs Launched:
Stage-Stage-0: Map: 1 Cumulative CPU: 1.88 sec HDFS Read: 12024 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 880 msec
OK
Time taken: 34.634 seconds
In the above log HDFS read i see some no. of bytes read, but write is zero. However when i curl to my elasticsearch node and see, a new index (customer) with data is created.
Now the issue is when i try to do : Select * from customer_es; I get below error
Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.io.IntWritable
Need some help to resolve this issue.