ElasticSearch 2.1.2 JAR for Hadoop Integration is not working


(vivek singh) #1

Hi,
I downloaded elasticsearch2.1.2 JAR and followed the guide to configure it in Hadoop. Everything looks ok but I am getting 'CAST' error while reading from the elasticsearch source. Below is the error message-
Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.elasticsearch.hadoop.mr.WritableArrayWritable cannot be cast to org.apache.hadoop.io.Text

I checked the datatypes of elasticsearch and hive fields and they are in sync. I even tried to get only those columns from elasticsearch which are string type but getting the same error as above. Any idea?


Failed with exception java.io.IOException:java.lang.ClassCastException: org.elasticsearch.hadoop.mr.WritableArrayWritable cannot be cast to org.apache.hadoop.io.Text
(vivek singh) #2

Any suggestions please?


(Costin Leau) #3

It's not clear what is going on from your post but based on the snippet of code, it might be that you are trying to read an array of string. Note that ES doesn't differentiate between one or multiple values - in fact it treats all the same. What is the actual stacktrace (in full) and what is your table definition?


(vivek singh) #4

A big thanks for the reply!
I am new to Elasticsearch. Ealsticsearch is being used to store log file info. I am not sure of the schema; though while querying the ES( GET _mapping) I can see the field datatypes. I tried creating same field datatypes in Hive but getting the error. Below is the table definition in Hive-
CREATE EXTERNAL TABLE Log_Event_ICS_ES(
product_version string,
agent_host string,
product_name string,
temp_time_stamp bigint,
log_message string,
org_id string,
log_datetime timestamp,
message string,
log_source_provider string,
log_source_name string,
log_message_for_trending string,
index_only_message string,
log_level string,
code_source string,
log_type string,
full_message string,
session_log_operation string,
source_received_time timestamp
)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.resource' = 'log_event_2015-05-11/log_event',
'es.nodes' = '',
'es.port' = ''
)
The actual stacktrace is same what I provided- It just gives 3 lines of error message-
Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.elasticsearch.hadoop.mr.WritableArrayWritable cannot be cast to org.apache.hadoop.io.Text


(vivek singh) #7

Hi Costin,

I got some hold over ES now, the issue indeed was with the datatype. Thanks for pointing it out.

I checked with my ES source team and they confirmed on the datatype which is of array type + the individual datatypes of the field. I made the changes accordingly in hive and everything now works!!


(Mohan P) #8

Hi Costin,
I See data type as text in ES in mycase, but while I query from hive I see its treated as Array.

Failed with exception java.io.IOException:java.lang.ClassCastException: org.elasticsearch.hadoop.mr.WritableArrayWritable cannot be cast to org.apache.hadoop.io.Text

Same as : ElasticSearch 2.1.2 JAR for Hadoop Integration is not working


(system) #9