I have been running multiple jobs on hive to check my performance after optimizing many things.
Suddenly, today morning I found that my any table I create as ES table in hive is not able to import data to ES cluster.
I am making index explicitly with following:-
curl -X PUT "http://host:9200/mixed3?pretty" -H 'Content-Type: application/json' -d'
{
"settings" : {
"index" : {
"number_of_shards" : 10,
"number_of_replicas" : 0,
"refresh_interval": -1
}
}
}'
And my ES table defined in hive is as:-
CREATE EXTERNAL TABLE IF NOT EXISTS mixed3(schema)ROW FORMAT SERDE 'org.elasticsearch.hadoop.hive.EsSerDe' STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.resource' = 'mixed3/mixedtype3','network.bind_host'='host_ip','es.chunk_size'='200000','es.nodes'='host_name', 'es.port'='9200',
'es.batch.write.retry.count'='2','es.batch.write.retry.wait'='2','es.nodes.discovery' = 'true','es.write.rest.error.handlers'='log','index.index_concurrency'='30',
'es.write.rest.error.handler.log.logger.name'='BulkErrors','threadpool.bulk.queue_size'='100000','es.batch.size.entries' = '10000','serialization.encoding'='ISO88591',
'indices.store.throttle.max_bytes_per_sec' = '100mb');
My data is coming from AWS, which I am picking up in table:-
CREATE TABLE IF NOT EXISTS PN_data (schema)
row format delimited fields terminated by '|' LOCATION 's3://bucket/' TBLPROPERTIES("skip.header.line.count"="1");
My command to overwrite:
insert overwrite table mixed3 select * from PN_data ;
This when ran, my xpack shows 0 documents and index rate as 0/sec.
Although when I tried to overwrite the same tables created yesterday, it ran perfectly and was loading the data as well.
Any help would be appreciated. TIA.