Insert Overwrite is not working through Hive-Elasticsearch Integration after multiple data import jobs through hive


(Krishna ) #1

I have been running multiple jobs on hive to check my performance after optimizing many things.
Suddenly, today morning I found that my any table I create as ES table in hive is not able to import data to ES cluster.
I am making index explicitly with following:-

curl -X PUT "http://host:9200/mixed3?pretty" -H 'Content-Type: application/json' -d'
{
    "settings" : {
        "index" : {
            "number_of_shards" : 10, 
            "number_of_replicas" : 0,
            "refresh_interval": -1
	   }
        }
    }'

And my ES table defined in hive is as:-

CREATE EXTERNAL TABLE IF NOT EXISTS mixed3(schema)ROW FORMAT SERDE 'org.elasticsearch.hadoop.hive.EsSerDe' STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler' 
TBLPROPERTIES('es.resource' = 'mixed3/mixedtype3','network.bind_host'='host_ip','es.chunk_size'='200000','es.nodes'='host_name', 'es.port'='9200',
'es.batch.write.retry.count'='2','es.batch.write.retry.wait'='2','es.nodes.discovery' = 'true','es.write.rest.error.handlers'='log','index.index_concurrency'='30',
'es.write.rest.error.handler.log.logger.name'='BulkErrors','threadpool.bulk.queue_size'='100000','es.batch.size.entries' = '10000','serialization.encoding'='ISO88591',
'indices.store.throttle.max_bytes_per_sec' = '100mb');

My data is coming from AWS, which I am picking up in table:-

CREATE TABLE IF NOT EXISTS PN_data (schema)
row format delimited fields terminated by '|'  LOCATION 's3://bucket/' TBLPROPERTIES("skip.header.line.count"="1"); 

My command to overwrite:
insert overwrite table mixed3 select * from PN_data ;

This when ran, my xpack shows 0 documents and index rate as 0/sec.
Although when I tried to overwrite the same tables created yesterday, it ran perfectly and was loading the data as well.

Any help would be appreciated. TIA.


(Krishna ) #2

Any comment would help me to resolve this. thanks!


(James Baiera) #3

I see you have 'es.write.rest.error.handlers'='log'. Do you see any error logs on your tasks that are sending data to ES? The log error handler ignores any failures sent to it and continues. It's possible that none of your records are even making it to Elasticsearch.


(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.