Is there any way to skip row inserting in ES while moving data from HIVE to ES?


(Rohit Gupta) #1

i am trying to copy data (more then 300 million rows) from hive table to Elasticsearch using this query:- insert overwrite table TableNameES select * from HiveTableName;

& after inserting some of rows i am getting this exception. i know this is because of different data type of mapping in ES ...

Caused by: org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: Found unrecoverable error [xx.xx.xx.xx:9200] returned Bad Request(400) - [MapperParsingException[failed to parse [pctblack.pctblack_raw]]; nested: NumberFormatException[For input string: "(S)"]; ]; Bailing out..**

i want to skip inserting those row which one giving error not only this exception may be other exception.


(James Baiera) #2

Hello!

I would advise filtering or transforming any troublesome data with the tools provided from Hive before the final insert into your Elasticsearch index. The insert ... table syntax supports a given select statement that should allow you to fix up or filter out any data that shouldn't be coming along for the ride.


(system) #3