Hadoop and ES connectivity

impu.vasudev · July 11, 2018, 10:18am

Hi all,

ES is installed in one of my node and i am running a trail version of ES but i am unable to connect it to Hadoop.
My question is can we establish connectivity between ES and Hadoop in trail version.

james.baiera · July 25, 2018, 2:49pm

ES-Hadoop should be able to connect to ES regardless of trial status or licensing, as long as it is a supported version of ES. Make sure that the version for ES-Hadoop matches as close as possible to the version of Elasticsearch you are using, and that if you are using any authentication features that you configure them properly in ES-Hadoop. Additionally, if you have any error messages, or exceptions to share, please post them here and we can take a look!

impu.vasudev · July 26, 2018, 5:21am

Hi James, Thanks for replying me,

These are my queries, all are working file, but i am unable run select statement and index was not created in elasticsearch

elasticsearch version - 6.3.2
Kibana Version - 6.3.2
es-hadoop Version - 6.3.2

es node - x.x.x.113
hive node - x.x.x.62

CREATE TABLE source(
POLID INT, NAME STRING )
ROW FORMAT DELIMITED FIELDS TERMINATED by ','
STORED AS TEXTFILE
LOCATION '/user/impu/csv/'

CREATE EXTERNAL TABLE source_test(
POLID INT, NAME STRING )
stored by 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES ('es.resource' = 'source/line',
'es.index.auto.create' = 'TRUE',
'es.nodes' = ' x.x.x.113',
'es.port' = '9200',
'es.nodes.discovery'='true',
'es.nodes.wan.only' ='false')

INSERT OVERWRITE TABLE source_test
SELECT POLID, NAME FROM source

select * from source_test

error after running select statement:

Bad status for request TFetchResultsReq(fetchType=0, operationHandle=TOperationHandle(hasResultSet=True, modifiedRowCount=None, operationType=0, operationId=THandleIdentifier(secret='\xce\x8aP\x16\xf7\xf8K\x9e\xb3\x14\xf4"\xc2K\x13-', guid='\xfeF\x92\xae\x86\xe6F\xff\x8d.A\x15@X\xbc\xf9')), orientation=4, maxRows=100): TFetchResultsResp(status=TStatus(errorCode=0, errorMessage='java.io.IOException: org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Index [source/line] missing and settings [es.index.read.missing.as.empty] is set to false', sqlState=None, infoMessages=['*org.apache.hive.service.cli.HiveSQLException:java.io.IOException: org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Index [source/line] missing and settings [es.index.read.missing.as.empty] is set to false:25:24', 'org.apache.hive.service.cli.operation.SQLOperation:getNextRowSet:SQLOperation.java:463', 'org.apache.hive.service.cli.operation.OperationManager:getOperationNextRowSet:OperationManager.java:294', 'org.apache.hive.service.cli.session.HiveSessionImpl:fetchResults:HiveSessionImpl.java:769', 'sun.reflect.GeneratedMethodAccessor37:invoke::-1', 'sun.reflect.DelegatingMethodAccessorImpl:invoke:DelegatingMethodAccessorImpl.java:43', 'java.lang.reflect.Method:invoke:Method.java:498', 'org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:78', 'org.apache.hive.service.cli.session.HiveSessionProxy:access$000:HiveSessionProxy.java:36', 'org.apache.hive.service.cli.session.HiveSessionProxy$1:run:HiveSessionProxy.java:63', 'java.security.AccessController:doPrivileged:AccessController.java:-2', 'javax.security.auth.Subject:doAs:Subject.java:422', 'org.apache.hadoop.security.UserGroupInformation:doAs:UserGroupInformation.java:1917', 'org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:59', 'com.sun.proxy.$Proxy23:fetchResults::-1', 'org.apache.hive.service.cli.CLIService:fetchResults:CLIService.java:462', 'org.apache.hive.service.cli.thrift.ThriftCLIService:FetchResults:ThriftCLIService.java:694', 'org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults:getResult:TCLIService.java:1553', 'org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults:getResult:TCLIService.java:1538', 'org.apache.thrift.ProcessFunction:process:ProcessFunction.java:39', 'org.apache.thrift.TBaseProcessor:process:TBaseProcessor.java:39', 'org.apache.hive.service.auth.TSetIpAddressProcessor:process:TSetIpAddressProcessor.java:56', 'org.apache.thrift.server.TThreadPoolServer$WorkerProcess:run:TThreadPoolServer.java:286', 'java.util.concurrent.ThreadPoolExecutor:runWorker:ThreadPoolExecutor.java:1149', 'java.util.concurrent.ThreadPoolExecutor$Worker:run:ThreadPoolExecutor.java:624', 'java.lang.Thread:run:Thread.java:748', '*java.io.IOException:org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Index [source/line] missing and settings [es.index.read.missing.as.empty] is set to false:29:4', 'org.apache.hadoop.hive.ql.exec.FetchOperator:getNextRow:FetchOperator.java:508', 'org.apache.hadoop.hive.ql.exec.FetchOperator:pushRow:FetchOperator.java:415', 'org.apache.hadoop.hive.ql.exec.FetchTask:fetch:FetchTask.java:140', 'org.apache.hadoop.hive.ql.Driver:getResults:Driver.java:2069', 'org.apache.hive.service.cli.operation.SQLOperation:getNextRowSet:SQLOperation.java:458', '*org.elasticsearch.hadoop.EsHadoopIllegalArgumentException:Index [source/line] missing and settings [es.index.read.missing.as.empty] is set to false:35:6', 'org.elasticsearch.hadoop.rest.RestService:findPartitions:RestService.java:238', 'org.elasticsearch.hadoop.mr.EsInputFormat:getSplits:EsInputFormat.java:412', 'org.elasticsearch.hadoop.hive.EsHiveInputFormat:getSplits:EsHiveInputFormat.java:113', 'org.elasticsearch.hadoop.hive.EsHiveInputFormat:getSplits:EsHiveInputFormat.java:50', 'org.apache.hadoop.hive.ql.exec.FetchOperator:getNextSplits:FetchOperator.java:363', 'org.apache.hadoop.hive.ql.exec.FetchOperator:getRecordReader:FetchOperator.java:295', 'org.apache.hadoop.hive.ql.exec.FetchOperator:getNextRow:FetchOperator.java:446'], statusCode=3), results=None, hasMoreRows=None)

james.baiera · July 26, 2018, 2:36pm

Is that insert operation actually sending any data to Elasticsearch? Does it have counters populated from the connector in the job output in Hadoop?

impu.vasudev · July 26, 2018, 2:39pm

yes, the source table has the data which is present i that csv table

james.baiera · July 26, 2018, 3:30pm

No, I mean, on the Hive job that is executed to perform the insert, were there any job counters that were made available? Can you turn on trace logging in the org.elasticsearch.hadoop.rest.commonshttp package to see what is being sent to the Elasticsearch server?

impu.vasudev · August 3, 2018, 10:54am

Hi James,

Thank you, Issue got solved.

system · August 31, 2018, 10:54am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Connecting hadoop and elasticsearch Elasticsearch es-hadoop	3	1453	September 20, 2019
Connection error when selecting from ES table in Hive Elasticsearch es-hadoop	4	1304	July 11, 2017
Cannot detect ES version Elasticsearch es-hadoop	2	2131	December 30, 2016
Hive ES Hadoop not finding ES cluster Elasticsearch es-hadoop	3	2093	July 6, 2017
Unable to connect with elasticsearch from hive Elasticsearch es-hadoop	2	1591	July 14, 2017

Hadoop and ES connectivity

Related topics