Why my master node receives many POST bulk requests when using es-hive-connector to create index?

My hive has more than 30 nodes, and my table's space is almost 140GB, another, my elasticsearch cluster ( 3 data nodes with 8 cores/16G memory) is isolated from the hive. Now,
I want to load data from hive into es according Apache Hive integration.

The following is my hiveQL script:

add jar elasticsearch-hadoop-5.2.2.jar;
drop table database_X.artists;
CREATE EXTERNAL TABLE database_X.artists(
user_id string, 
province int ,
...
col34 string)  -- the table has 34 columns
stored by 'org.elasticsearch.hadoop.hive.EsStorageHandler'
tblproperties('es.resource' = 'dillon_pengcz/artists', 'es.nodes' = '172.21.8.24', 'es.index.auto.create' = 'true', 'es.mapping.id'='caa_id', 'es.batch.size.entries'='0', 'es.batch.size.bytes' = '4mb');
insert overwrite table database_X.artists select * from database_X.artists_src;

'172.21.8.24' is my ES master node ip

These days I can not successfully executed the above script. So I successfully tested 100000 records through limit as follows:
insert overwrite table database_X.artists select * from database_X.artists_src limit 100000;
And I used tcpflow -p -c -i eth1 port 9200 to find what's happening. But I found something different from my understanding:

In my master node 172.21.8.24, I got many many POST bulk request as follows:
172.021.008.024.56340-172.021.008.034.09200: POST /_bulk HTTP/1.1^M
User-Agent: curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.19.1 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2^M
Host: 172.21.8.34:9200^M
Accept: /^M
Content-Length: 17500403^M
Content-Type: application/x-www-form-urlencoded^M
Expect: 100-continue^M
^M

172.021.008.024.09200-172.021.008.034.56340: HTTP/1.1 100 Continue^M
^M

What is the error message? From skimming this problem it looks like 140GB of data all at once is too much to handle.

@Dillon_Peng Could you include the trace level logs for the org.elasticsearch.hadoop.rest.commonshttp package and include them here? I imagine there may be something amiss with your discovery settings. Are you hosting your master node as a standalone master or is it also acting as a datanode?

hi, James and Jimmy
I am so sorry for not replying in time! Last three days were my little holiday!
I finally found that this result was relative to my wrong settings because, for testing performance of ES-hive, I copied the whole directory elasticsearch-5.2.2 into my testing and separated cluster(Of course I modified some obvious setting such as ips for original cluster). Later on, I built a new testing cluster from scratch with elasticsearch-5.2.2.tar.gz, the strange POST disappeared.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.