Issue of elasticsearch-hadoop-2.0.0 with Hive (cloudera and hortonworks), helps are needed


(elitem way) #1

I am learning the elasticsearch-hadoop. I have a few issues that I do not
understand. I am using ES 1.12 on Windows, elasticsearch-hadoop-2.0.0 and
cloudera-quickstart-vm-5.0.0-0-vmware sandbox with Hive.

  1. I loaded only 6 rows to ES index car/transactions. Why did Hive return
    14 rows instead? See below.
  2. "select count(*) from cars2" failed with code 2. "Group by", "sum" also
    failed. Did I miss anything. The similar query are successful when using
    sample_07 and sample_08 tables that come with Hive.
  3. elasticsearch-hadoop-2.0.0 does seem to work with jetty - the
    authentication plugin. I got errors when I enable jetty and set 'es.nodes'
    = 'superuser:admin@192.168.128.1'
  4. I could not pipe data from Hive to ElasticSearch either.

--ISSUE 1:
--load data to ES
­ POST: http://localhost:9200/cars/transactions/_bulk
{ "index": {}}
{ "price" : 30000, "color" : "green", "make" : "ford", "sold" :
"2014-05-18" }
{ "index": {}}
{ "price" : 15000, "color" : "blue", "make" : "toyota", "sold" :
"2014-07-02" }
{ "index": {}}
{ "price" : 12000, "color" : "green", "make" : "toyota", "sold" :
"2014-08-19" }
{ "index": {}}
{ "price" : 20000, "color" : "red", "make" : "honda", "sold" : "2014-11-05"
}
{ "index": {}}
{ "price" : 80000, "color" : "red", "make" : "bmw", "sold" : "2014-01-01" }
{ "index": {}}
{ "price" : 25000, "color" : "blue", "make" : "ford", "sold" : "2014-02-12"
}

CREATE EXTERNAL TABLE cars2 (color STRING, make STRING, price BIGINT, sold
TIMESTAMP)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.resource' = 'cars/transactions',
'es.nodes' = '192.168.128.1', 'es.port'='9200');

HIVE: select * from cars2;
14 rows returned.

color make price sold
0 red honda 20000 2014-11-05 00:00:00.0
1 red honda 10000 2014-10-28 00:00:00.0
2 green ford 30000 2014-05-18 00:00:00.0
3 green toyota 12000 2014-08-19 00:00:00.0
4 blue ford 25000 2014-02-12 00:00:00.0
5 blue toyota 15000 2014-07-02 00:00:00.0
6 red bmw 80000 2014-01-01 00:00:00.0
7 red honda 10000 2014-10-28 00:00:00.0
8 blue toyota 15000 2014-07-02 00:00:00.0
9 red honda 20000 2014-11-05 00:00:00.0
10 green ford 30000 2014-05-18 00:00:00.0
11 green toyota 12000 2014-08-19 00:00:00.0
12 red honda 20000 2014-11-05 00:00:00.0
13 red honda 20000 2014-11-05 00:00:00.0
14 red bmw 80000 2014-01-01 00:00:00.0

ISSUE2:

HIVE: select count(*) from cars2;

Your query has the following error(s):
Error while processing statement: FAILED: Execution Error, return code 2
from org.apache.hadoop.hive.ql.exec.mr.MapRedTask

--ISSUE 4:

CREATE EXTERNAL TABLE test1 (
description STRING)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.host' = '192.168.128.1', 'es.port'='9200', 'es.resource'
= 'test1');

INSERT OVERWRITE TABLE test1 select description from sample_07;

Your query has the following error(s):

Error while processing statement: FAILED: Execution Error, return code 2
from org.apache.hadoop.hive.ql.exec.mr.MapRedTask

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8c642665-424a-48be-bc5d-8625b94243c0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(elitem way) #2

Here is the Hive log when running the "select count(*) from cars2;":

application_1402243729361_0009

14/06/08 10:27:19 INFO log.PerfLogger:
14/06/08 10:27:19 INFO log.PerfLogger:
14/06/08 10:27:19 INFO parse.ParseDriver: Parsing command: select count()
from cars2
14/06/08 10:27:19 INFO parse.ParseDriver: Parse Completed
14/06/08 10:27:19 INFO log.PerfLogger: </PERFLOG method=parse
start=1402248439935 end=1402248439936 duration=1
from=org.apache.hadoop.hive.ql.Driver>
14/06/08 10:27:19 INFO log.PerfLogger:
14/06/08 10:27:19 INFO parse.SemanticAnalyzer: Starting Semantic Analysis
14/06/08 10:27:19 INFO parse.SemanticAnalyzer: Completed phase 1 of
Semantic Analysis
14/06/08 10:27:19 INFO parse.SemanticAnalyzer: Get metadata for source
tables
14/06/08 10:27:19 INFO parse.SemanticAnalyzer: Get metadata for subqueries
14/06/08 10:27:19 INFO parse.SemanticAnalyzer: Get metadata for destination
tables
14/06/08 10:27:19 INFO ql.Context: New scratch dir is
hdfs://localhost.localdomain:8020/tmp/hive-hive/hive_2014-06-08_10-27-19_935_2002159029513445063-1
14/06/08 10:27:19 INFO parse.SemanticAnalyzer: Completed getting MetaData
in Semantic Analysis
14/06/08 10:27:19 INFO ppd.OpProcFactory: Processing for FS(18)
14/06/08 10:27:19 INFO ppd.OpProcFactory: Processing for SEL(17)
14/06/08 10:27:19 INFO ppd.OpProcFactory: Processing for GBY(16)
14/06/08 10:27:19 INFO ppd.OpProcFactory: Processing for RS(15)
14/06/08 10:27:19 INFO ppd.OpProcFactory: Processing for GBY(14)
14/06/08 10:27:19 INFO ppd.OpProcFactory: Processing for SEL(13)
14/06/08 10:27:19 INFO ppd.OpProcFactory: Processing for TS(12)
14/06/08 10:27:19 INFO log.PerfLogger:
14/06/08 10:27:19 INFO log.PerfLogger: </PERFLOG
method=partition-retrieving start=1402248439988 end=1402248439989
duration=1 from=org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner>
14/06/08 10:27:19 INFO physical.MetadataOnlyOptimizer: Looking for table
scans where optimization is applicable
14/06/08 10:27:19 INFO physical.MetadataOnlyOptimizer: Found 0 metadata
only table scans
14/06/08 10:27:19 INFO parse.SemanticAnalyzer: Completed plan generation
14/06/08 10:27:19 INFO ql.Driver: Semantic Analysis Completed
14/06/08 10:27:19 INFO log.PerfLogger: </PERFLOG method=semanticAnalyze
start=1402248439936 end=1402248439990 duration=54
from=org.apache.hadoop.hive.ql.Driver>
14/06/08 10:27:19 INFO exec.ListSinkOperator: Initializing Self 19 OP
14/06/08 10:27:19 INFO exec.ListSinkOperator: Operator 19 OP initialized
14/06/08 10:27:19 INFO exec.ListSinkOperator: Initialization Done 19 OP
14/06/08 10:27:19 INFO ql.Driver: Returning Hive schema:
Schema(fieldSchemas:[FieldSchema(name:_c0, type:bigint, comment:null)],
properties:null)
14/06/08 10:27:19 INFO log.PerfLogger: </PERFLOG method=compile
start=1402248439935 end=1402248439991 duration=56
from=org.apache.hadoop.hive.ql.Driver>
14/06/08 10:27:19 INFO Configuration.deprecation:
mapred.input.dir.recursive is deprecated. Instead, use
mapreduce.input.fileinputformat.input.dir.recursive
14/06/08 10:27:19 INFO log.PerfLogger:
14/06/08 10:27:19 INFO log.PerfLogger:
14/06/08 10:27:19 INFO ql.Driver: Creating lock manager of type
org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager
14/06/08 10:27:19 INFO zookeeper.ZooKeeper: Initiating client connection,
connectString=localhost.localdomain:2181 sessionTimeout=600000
watcher=org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager$DummyWatcher@d699a84
14/06/08 10:27:19 INFO log.PerfLogger:
14/06/08 10:27:20 INFO log.PerfLogger: </PERFLOG
method=acquireReadWriteLocks start=1402248439999 end=1402248440012
duration=13 from=org.apache.hadoop.hive.ql.Driver>
14/06/08 10:27:20 INFO log.PerfLogger:
14/06/08 10:27:20 INFO ql.Driver: Starting command: select count(
) from
cars2
14/06/08 10:27:20 INFO ql.Driver: Total MapReduce jobs = 1
14/06/08 10:27:20 INFO log.PerfLogger: </PERFLOG method=TimeToSubmit
start=1402248439992 end=1402248440012 duration=20
from=org.apache.hadoop.hive.ql.Driver>
14/06/08 10:27:20 INFO log.PerfLogger:
14/06/08 10:27:20 INFO log.PerfLogger:
14/06/08 10:27:20 INFO ql.Driver: Launching Job 1 out of 1
14/06/08 10:27:20 INFO exec.Task: Number of reduce tasks determined at
compile time: 1
14/06/08 10:27:20 INFO exec.Task: In order to change the average load for a
reducer (in bytes):
14/06/08 10:27:20 INFO exec.Task: set
hive.exec.reducers.bytes.per.reducer=
14/06/08 10:27:20 INFO exec.Task: In order to limit the maximum number of
reducers:
14/06/08 10:27:20 INFO exec.Task: set hive.exec.reducers.max=
14/06/08 10:27:20 INFO exec.Task: In order to set a constant number of
reducers:
14/06/08 10:27:20 INFO exec.Task: set mapred.reduce.tasks=
14/06/08 10:27:20 INFO ql.Context: New scratch dir is
hdfs://localhost.localdomain:8020/tmp/hive-hive/hive_2014-06-08_10-27-19_935_2002159029513445063-3
14/06/08 10:27:20 INFO Configuration.deprecation:
mapred.reduce.tasks.speculative.execution is deprecated. Instead, use
mapreduce.reduce.speculative
14/06/08 10:27:20 INFO mr.ExecDriver: Using
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat
14/06/08 10:27:20 INFO mr.ExecDriver: adding libjars:
file:///opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/hive/lib/hive-hbase-handler-0.12.0-cdh5.0.0.jar,file:///opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/hbase/hbase-protocol.jar,file:///opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/hbase/hbase-client.jar,file:///opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/hbase/lib/htrace-core.jar,file:///opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/hbase/lib/htrace-core-2.01.jar,file:///opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/hbase/hbase-hadoop-compat.jar,file:///opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/hbase/hbase-hadoop2-compat.jar,file:///opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/hbase/hbase-server.jar,file:///opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/hbase/hbase-common.jar
14/06/08 10:27:20 INFO exec.Utilities: Processing alias cars2
14/06/08 10:27:20 INFO exec.Utilities: Adding input file
hdfs://localhost.localdomain:8020/user/hive/warehouse/cars2
14/06/08 10:27:20 INFO exec.Utilities: Content Summary not cached for
hdfs://localhost.localdomain:8020/user/hive/warehouse/cars2
14/06/08 10:27:20 INFO ql.Context: New scratch dir is
hdfs://localhost.localdomain:8020/tmp/hive-hive/hive_2014-06-08_10-27-19_935_2002159029513445063-3
14/06/08 10:27:20 INFO log.PerfLogger:
14/06/08 10:27:20 INFO exec.Utilities: Serializing MapWork via kryo
14/06/08 10:27:20 INFO log.PerfLogger: </PERFLOG method=serializePlan
start=1402248440040 end=1402248440067 duration=27
from=org.apache.hadoop.hive.ql.exec.Utilities>
14/06/08 10:27:20 INFO log.PerfLogger:
14/06/08 10:27:20 INFO exec.Utilities: Serializing ReduceWork via kryo
14/06/08 10:27:20 INFO log.PerfLogger: </PERFLOG method=serializePlan
start=1402248440074 end=1402248440096 duration=22
from=org.apache.hadoop.hive.ql.exec.Utilities>
14/06/08 10:27:20 INFO client.RMProxy: Connecting to ResourceManager at
localhost.localdomain/127.0.0.1:8032
14/06/08 10:27:20 INFO cli.CLIService: OperationHandle
[opType=EXECUTE_STATEMENT,
getHandleIdentifier()=ef7105d2-e323-44f4-b1b8-3e2aca12b198]:
getOperationStatus()
14/06/08 10:27:20 INFO client.RMProxy: Connecting to ResourceManager at
localhost.localdomain/127.0.0.1:8032
14/06/08 10:27:20 WARN mapreduce.JobSubmitter: Hadoop command-line option
parsing not performed. Implement the Tool interface and execute your
application with ToolRunner to remedy this.
14/06/08 10:27:20 INFO cli.CLIService: OperationHandle
[opType=EXECUTE_STATEMENT,
getHandleIdentifier()=ef7105d2-e323-44f4-b1b8-3e2aca12b198]:
getOperationStatus()
14/06/08 10:27:20 INFO log.PerfLogger:
14/06/08 10:27:20 INFO log.PerfLogger:
14/06/08 10:27:20 INFO mr.EsInputFormat: Reading from [cars/transactions]
14/06/08 10:27:20 INFO mr.EsInputFormat: Discovered mapping
{cars=[mappings=[transactions=[color=STRING, make=STRING, price=LONG,
sold=DATE]]]} for [cars/transactions]
14/06/08 10:27:20 INFO mr.EsInputFormat: Created [5] shard-splits
14/06/08 10:27:20 INFO io.HiveInputFormat: number of splits 5
14/06/08 10:27:20 INFO log.PerfLogger: </PERFLOG method=getSplits
start=1402248440609 end=1402248440652 duration=43
from=org.apache.hadoop.hive.ql.io.HiveInputFormat>
14/06/08 10:27:20 INFO cli.CLIService: OperationHandle
[opType=EXECUTE_STATEMENT,
getHandleIdentifier()=ef7105d2-e323-44f4-b1b8-3e2aca12b198]:
getOperationStatus()
14/06/08 10:27:21 INFO mapreduce.JobSubmitter: number of splits:5
14/06/08 10:27:21 INFO mapreduce.JobSubmitter: Submitting tokens for job:
job_1402243729361_0009
14/06/08 10:27:21 INFO cli.CLIService: OperationHandle
[opType=EXECUTE_STATEMENT,
getHandleIdentifier()=ef7105d2-e323-44f4-b1b8-3e2aca12b198]:
getOperationStatus()
14/06/08 10:27:21 INFO impl.YarnClientImpl: Submitted application
application_1402243729361_0009
14/06/08 10:27:21 INFO mapreduce.Job: The url to track the job:
http://localhost.localdomain:8088/proxy/application_1402243729361_0009/
14/06/08 10:27:21 INFO exec.Task: Starting Job = job_1402243729361_0009,
Tracking URL =
http://localhost.localdomain:8088/proxy/application_1402243729361_0009/
14/06/08 10:27:21 INFO exec.Task: Kill Command =
/opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/hadoop/bin/hadoop job
-kill job_1402243729361_0009
14/06/08 10:27:21 INFO cli.CLIService: OperationHandle
[opType=EXECUTE_STATEMENT,
getHandleIdentifier()=ef7105d2-e323-44f4-b1b8-3e2aca12b198]:
getOperationStatus()
14/06/08 10:27:22 INFO cli.CLIService: OperationHandle
[opType=EXECUTE_STATEMENT,
getHandleIdentifier()=ef7105d2-e323-44f4-b1b8-3e2aca12b198]:
getOperationStatus()
14/06/08 10:27:23 INFO cli.CLIService: OperationHandle
[opType=EXECUTE_STATEMENT,
getHandleIdentifier()=ef7105d2-e323-44f4-b1b8-3e2aca12b198]:
getOperationStatus()
14/06/08 10:27:24 INFO cli.CLIService: OperationHandle
[opType=EXECUTE_STATEMENT,
getHandleIdentifier()=ef7105d2-e323-44f4-b1b8-3e2aca12b198]:
getOperationStatus()
14/06/08 10:27:25 INFO cli.CLIService: OperationHandle
[opType=EXECUTE_STATEMENT,
getHandleIdentifier()=ef7105d2-e323-44f4-b1b8-3e2aca12b198]:
getOperationStatus()
14/06/08 10:27:26 INFO cli.CLIService: OperationHandle
[opType=EXECUTE_STATEMENT,
getHandleIdentifier()=ef7105d2-e323-44f4-b1b8-3e2aca12b198]:
getOperationStatus()
14/06/08 10:27:27 INFO cli.CLIService: OperationHandle
[opType=EXECUTE_STATEMENT,
getHandleIdentifier()=ef7105d2-e323-44f4-b1b8-3e2aca12b198]:
getOperationStatus()
14/06/08 10:27:27 INFO exec.Task: Hadoop job information for Stage-1:
number of mappers: 5; number of reducers: 1
14/06/08 10:27:27 WARN mapreduce.Counters: Group
org.apache.hadoop.mapred.Task$Counter is deprecated. Use
org.apache.hadoop.mapreduce.TaskCounter instead
14/06/08 10:27:27 INFO exec.Task: 2014-06-08 10:27:27,858 Stage-1 map =
0%, reduce = 0%
14/06/08 10:27:28 INFO cli.CLIService: OperationHandle
[opType=EXECUTE_STATEMENT,
getHandleIdentifier()=ef7105d2-e323-44f4-b1b8-3e2aca12b198]:
getOperationStatus()
14/06/08 10:27:29 INFO cli.CLIService: OperationHandle
[opType=EXECUTE_STATEMENT,
getHandleIdentifier()=ef7105d2-e323-44f4-b1b8-3e2aca12b198]:
getOperationStatus()
14/06/08 10:27:31 INFO cli.CLIService: OperationHandle
[opType=EXECUTE_STATEMENT,
getHandleIdentifier()=ef7105d2-e323-44f4-b1b8-3e2aca12b198]:
getOperationStatus()
14/06/08 10:27:33 INFO cli.CLIService: OperationHandle
[opType=EXECUTE_STATEMENT,
getHandleIdentifier()=ef7105d2-e323-44f4-b1b8-3e2aca12b198]:
getOperationStatus()
14/06/08 10:27:34 INFO cli.CLIService: OperationHandle
[opType=EXECUTE_STATEMENT,
getHandleIdentifier()=ef7105d2-e323-44f4-b1b8-3e2aca12b198]:
getOperationStatus()
14/06/08 10:27:36 INFO cli.CLIService: OperationHandle
[opType=EXECUTE_STATEMENT,
getHandleIdentifier()=ef7105d2-e323-44f4-b1b8-3e2aca12b198]:
getOperationStatus()
14/06/08 10:27:38 INFO cli.CLIService: OperationHandle
[opType=EXECUTE_STATEMENT,
getHandleIdentifier()=ef7105d2-e323-44f4-b1b8-3e2aca12b198]:
getOperationStatus()
14/06/08 10:27:40 INFO cli.CLIService: OperationHandle
[opType=EXECUTE_STATEMENT,
getHandleIdentifier()=ef7105d2-e323-44f4-b1b8-3e2aca12b198]:
getOperationStatus()
14/06/08 10:27:42 INFO cli.CLIService: OperationHandle
[opType=EXECUTE_STATEMENT,
getHandleIdentifier()=ef7105d2-e323-44f4-b1b8-3e2aca12b198]:
getOperationStatus()
14/06/08 10:27:44 INFO cli.CLIService: OperationHandle
[opType=EXECUTE_STATEMENT,
getHandleIdentifier()=ef7105d2-e323-44f4-b1b8-3e2aca12b198]:
getOperationStatus()
14/06/08 10:27:46 INFO cli.CLIService: OperationHandle
[opType=EXECUTE_STATEMENT,
getHandleIdentifier()=ef7105d2-e323-44f4-b1b8-3e2aca12b198]:
getOperationStatus()
14/06/08 10:27:48 INFO cli.CLIService: OperationHandle
[opType=EXECUTE_STATEMENT,
getHandleIdentifier()=ef7105d2-e323-44f4-b1b8-3e2aca12b198]:
getOperationStatus()
14/06/08 10:27:51 INFO cli.CLIService: OperationHandle
[opType=EXECUTE_STATEMENT,
getHandleIdentifier()=ef7105d2-e323-44f4-b1b8-3e2aca12b198]:
getOperationStatus()
14/06/08 10:27:53 INFO cli.CLIService: OperationHandle
[opType=EXECUTE_STATEMENT,
getHandleIdentifier()=ef7105d2-e323-44f4-b1b8-3e2aca12b198]:
getOperationStatus()
14/06/08 10:27:55 INFO cli.CLIService: OperationHandle
[opType=EXECUTE_STATEMENT,
getHandleIdentifier()=ef7105d2-e323-44f4-b1b8-3e2aca12b198]:
getOperationStatus()
14/06/08 10:27:57 INFO cli.CLIService: OperationHandle
[opType=EXECUTE_STATEMENT,
getHandleIdentifier()=ef7105d2-e323-44f4-b1b8-3e2aca12b198]:
getOperationStatus()
14/06/08 10:27:59 INFO cli.CLIService: OperationHandle
[opType=EXECUTE_STATEMENT,
getHandleIdentifier()=ef7105d2-e323-44f4-b1b8-3e2aca12b198]:
getOperationStatus()
14/06/08 10:28:01 INFO cli.CLIService: OperationHandle
[opType=EXECUTE_STATEMENT,
getHandleIdentifier()=ef7105d2-e323-44f4-b1b8-3e2aca12b198]:
getOperationStatus()
14/06/08 10:28:03 INFO cli.CLIService: OperationHandle
[opType=EXECUTE_STATEMENT,
getHandleIdentifier()=ef7105d2-e323-44f4-b1b8-3e2aca12b198]:
getOperationStatus()
14/06/08 10:28:05 INFO cli.CLIService: OperationHandle
[opType=EXECUTE_STATEMENT,
getHandleIdentifier()=ef7105d2-e323-44f4-b1b8-3e2aca12b198]:
getOperationStatus()
14/06/08 10:28:08 INFO cli.CLIService: OperationHandle
[opType=EXECUTE_STATEMENT,
getHandleIdentifier()=ef7105d2-e323-44f4-b1b8-3e2aca12b198]:
getOperationStatus()
14/06/08 10:28:10 INFO cli.CLIService: OperationHandle
[opType=EXECUTE_STATEMENT,
getHandleIdentifier()=ef7105d2-e323-44f4-b1b8-3e2aca12b198]:
getOperationStatus()
14/06/08 10:28:12 INFO exec.Task: 2014-06-08 10:28:12,001 Stage-1 map =
100%, reduce = 100%
14/06/08 10:28:12 INFO cli.CLIService: OperationHandle
[opType=EXECUTE_STATEMENT,
getHandleIdentifier()=ef7105d2-e323-44f4-b1b8-3e2aca12b198]:
getOperationStatus()
14/06/08 10:28:13 ERROR exec.Task: Ended Job = job_1402243729361_0009 with
errors
14/06/08 10:28:13 INFO impl.YarnClientImpl: Killed application
application_1402243729361_0009
14/06/08 10:28:13 INFO log.PerfLogger: </PERFLOG method=task.MAPRED.Stage-1
start=1402248440012 end=1402248493129 duration=53117
from=org.apache.hadoop.hive.ql.Driver>
14/06/08 10:28:13 ERROR ql.Driver: FAILED: Execution Error, return code 2
from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
14/06/08 10:28:13 INFO log.PerfLogger: </PERFLOG method=Driver.execute
start=1402248440012 end=1402248493130 duration=53118
from=org.apache.hadoop.hive.ql.Driver>
14/06/08 10:28:13 INFO ql.Driver: MapReduce Jobs Launched:
14/06/08 10:28:13 WARN mapreduce.Counters: Group FileSystemCounters is
deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead
14/06/08 10:28:13 INFO ql.Driver: Job 0: Map: 5 Reduce: 1 HDFS Read: 0
HDFS Write: 0 FAIL
14/06/08 10:28:13 INFO ql.Driver: Total MapReduce CPU Time Spent: 0 msec
14/06/08 10:28:13 INFO log.PerfLogger:
14/06/08 10:28:13 INFO ZooKeeperHiveLockManager: about to release lock for
default/cars2
14/06/08 10:28:13 INFO ZooKeeperHiveLockManager: about to release lock for
default
14/06/08 10:28:13 INFO log.PerfLogger: </PERFLOG method=releaseLocks
start=1402248493131 end=1402248493141 duration=10
from=org.apache.hadoop.hive.ql.Driver>
14/06/08 10:28:13 ERROR operation.Operation: Error:
org.apache.hive.service.cli.HiveSQLException: Error while processing
statement: FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.mr.MapRedTask
at
org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:146)
at
org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:64)
at
org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:177)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
14/06/08 10:28:14 INFO cli.CLIService: OperationHandle
[opType=EXECUTE_STATEMENT,
getHandleIdentifier()=ef7105d2-e323-44f4-b1b8-3e2aca12b198]:
getOperationStatus()

On Sunday, June 8, 2014 1:03:43 PM UTC-4, elitem way wrote:

I am learning the elasticsearch-hadoop. I have a few issues that I do not
understand. I am using ES 1.12 on Windows, elasticsearch-hadoop-2.0.0 and
cloudera-quickstart-vm-5.0.0-0-vmware sandbox with Hive.

  1. I loaded only 6 rows to ES index car/transactions. Why did Hive return
    14 rows instead? See below.
  2. "select count(*) from cars2" failed with code 2. "Group by", "sum" also
    failed. Did I miss anything. The similar query are successful when using
    sample_07 and sample_08 tables that come with Hive.
  3. elasticsearch-hadoop-2.0.0 does seem to work with jetty - the
    authentication plugin. I got errors when I enable jetty and set 'es.nodes'
    = 'superuser:admin@192.168.128.1'
  4. I could not pipe data from Hive to ElasticSearch either.

--ISSUE 1:
--load data to ES
­ POST: http://localhost:9200/cars/transactions/_bulk
{ "index": {}}
{ "price" : 30000, "color" : "green", "make" : "ford", "sold" :
"2014-05-18" }
{ "index": {}}
{ "price" : 15000, "color" : "blue", "make" : "toyota", "sold" :
"2014-07-02" }
{ "index": {}}
{ "price" : 12000, "color" : "green", "make" : "toyota", "sold" :
"2014-08-19" }
{ "index": {}}
{ "price" : 20000, "color" : "red", "make" : "honda", "sold" :
"2014-11-05" }
{ "index": {}}
{ "price" : 80000, "color" : "red", "make" : "bmw", "sold" : "2014-01-01" }
{ "index": {}}
{ "price" : 25000, "color" : "blue", "make" : "ford", "sold" :
"2014-02-12" }

CREATE EXTERNAL TABLE cars2 (color STRING, make STRING, price BIGINT, sold
TIMESTAMP)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.resource' = 'cars/transactions',
'es.nodes' = '192.168.128.1', 'es.port'='9200');

HIVE: select * from cars2;
14 rows returned.

color make price sold
0 red honda 20000 2014-11-05 00:00:00.0
1 red honda 10000 2014-10-28 00:00:00.0
2 green ford 30000 2014-05-18 00:00:00.0
3 green toyota 12000 2014-08-19 00:00:00.0
4 blue ford 25000 2014-02-12 00:00:00.0
5 blue toyota 15000 2014-07-02 00:00:00.0
6 red bmw 80000 2014-01-01 00:00:00.0
7 red honda 10000 2014-10-28 00:00:00.0
8 blue toyota 15000 2014-07-02 00:00:00.0
9 red honda 20000 2014-11-05 00:00:00.0
10 green ford 30000 2014-05-18 00:00:00.0
11 green toyota 12000 2014-08-19 00:00:00.0
12 red honda 20000 2014-11-05 00:00:00.0
13 red honda 20000 2014-11-05 00:00:00.0
14 red bmw 80000 2014-01-01 00:00:00.0

ISSUE2:

HIVE: select count(*) from cars2;

Your query has the following error(s):
Error while processing statement: FAILED: Execution Error, return code 2
from org.apache.hadoop.hive.ql.exec.mr.MapRedTask

--ISSUE 4:

CREATE EXTERNAL TABLE test1 (
description STRING)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.host' = '192.168.128.1', 'es.port'='9200', 'es.resource'
= 'test1');

INSERT OVERWRITE TABLE test1 select description from sample_07;

Your query has the following error(s):

Error while processing statement: FAILED: Execution Error, return code 2
from org.apache.hadoop.hive.ql.exec.mr.MapRedTask

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e17dd332-a8d5-4636-8b12-438559e018de%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Costin Leau) #3

Hi,

Sorry for the delayed response, travel and other things got in the way. I have tried replicating the issue on my end and
couldn't; see below:

On 6/8/14 8:03 PM, elitem way wrote:

I am learning the elasticsearch-hadoop. I have a few issues that I do not understand. I am using ES 1.12 on Windows,
elasticsearch-hadoop-2.0.0 and cloudera-quickstart-vm-5.0.0-0-vmware sandbox with Hive.

  1. I loaded only 6 rows to ES index car/transactions. Why did Hive return 14 rows instead? See below.
  2. "select count(*) from cars2" failed with code 2. "Group by", "sum" also failed. Did I miss anything. The similar
    query are successful when using sample_07 and sample_08 tables that come with Hive.
  3. elasticsearch-hadoop-2.0.0 does seem to work with jetty - the authentication plugin. I got errors when I enable
    jetty and set 'es.nodes' = 'superuser:admin@192.168.128.1'
  4. I could not pipe data from Hive to ElasticSearch either.

--ISSUE 1:
--load data to ES
­ POST: http://localhost:9200/cars/transactions/_bulk
{ "index": {}}
{ "price" : 30000, "color" : "green", "make" : "ford", "sold" : "2014-05-18" }
{ "index": {}}
{ "price" : 15000, "color" : "blue", "make" : "toyota", "sold" : "2014-07-02" }
{ "index": {}}
{ "price" : 12000, "color" : "green", "make" : "toyota", "sold" : "2014-08-19" }
{ "index": {}}
{ "price" : 20000, "color" : "red", "make" : "honda", "sold" : "2014-11-05" }
{ "index": {}}
{ "price" : 80000, "color" : "red", "make" : "bmw", "sold" : "2014-01-01" }
{ "index": {}}
{ "price" : 25000, "color" : "blue", "make" : "ford", "sold" : "2014-02-12" }

CREATE EXTERNAL TABLE cars2 (color STRING, make STRING, price BIGINT, sold TIMESTAMP)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.resource' = 'cars/transactions',
'es.nodes' = '192.168.128.1', 'es.port'='9200');

HIVE: select * from cars2;
14 rows returned.

color make price sold
0 red honda 20000 2014-11-05 00:00:00.0
1 red honda 10000 2014-10-28 00:00:00.0
2 green ford 30000 2014-05-18 00:00:00.0
3 green toyota 12000 2014-08-19 00:00:00.0
4 blue ford 25000 2014-02-12 00:00:00.0
5 blue toyota 15000 2014-07-02 00:00:00.0
6 red bmw 80000 2014-01-01 00:00:00.0
7 red honda 10000 2014-10-28 00:00:00.0
8 blue toyota 15000 2014-07-02 00:00:00.0
9 red honda 20000 2014-11-05 00:00:00.0
10 green ford 30000 2014-05-18 00:00:00.0
11 green toyota 12000 2014-08-19 00:00:00.0
12 red honda 20000 2014-11-05 00:00:00.0
13 red honda 20000 2014-11-05 00:00:00.0
14 red bmw 80000 2014-01-01 00:00:00.0

It looks like you are adding data to localhost:9200 but querying on 192.168.128.1:9200 - most likely they are different,
hence
the different data set. To double check, do a query/count through curl on ES and then check the data through Hive -
that's what we do in our tests.

ISSUE2:

HIVE: select count(*) from cars2;

Your query has the following error(s):
Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask

Again since you are querying a different host it's hard to tell what's the issue. count(*) works in our tests but I've
seen cases where count fails when dealing the newly introduced types (like timestamp). You can use count(1) as an
alternative which should work just fine.

--ISSUE 4:

CREATE EXTERNAL TABLE test1 (
description STRING)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.host' = '192.168.128.1', 'es.port'='9200', 'es.resource' = 'test1');

INSERT OVERWRITE TABLE test1 select description from sample_07;

Your query has the following error(s):

Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask

That is because you have an invalid table definition; the resource needs to point to a "index/type" not just an index -
if you look deep into the Hive exception, you should be able to see the actual validation message. Since Hive executes
things lazily and on the server side, there's no other way of reporting the error to the user...

Hope this helps,

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
elasticsearch+unsubscribe@googlegroups.com mailto:elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/8c642665-424a-48be-bc5d-8625b94243c0%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/8c642665-424a-48be-bc5d-8625b94243c0%40googlegroups.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout.

--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/539B3BFF.5000009%40gmail.com.
For more options, visit https://groups.google.com/d/optout.


(elitem way) #4

Thank you for the response. The localhost and 192.168.128.1 are the actually the same ES host. I installed ES cloudera vm on xp. I will try your suggestion though and report back. I will try the table without timestamp column.

Sent from my iPhone

On Jun 13, 2014, at 1:59 PM, Costin Leau costin.leau@gmail.com wrote:

Hi,

Sorry for the delayed response, travel and other things got in the way. I have tried replicating the issue on my end and couldn't; see below:

On 6/8/14 8:03 PM, elitem way wrote:
I am learning the elasticsearch-hadoop. I have a few issues that I do not understand. I am using ES 1.12 on Windows,
elasticsearch-hadoop-2.0.0 and cloudera-quickstart-vm-5.0.0-0-vmware sandbox with Hive.

  1. I loaded only 6 rows to ES index car/transactions. Why did Hive return 14 rows instead? See below.
  2. "select count(*) from cars2" failed with code 2. "Group by", "sum" also failed. Did I miss anything. The similar
    query are successful when using sample_07 and sample_08 tables that come with Hive.
  3. elasticsearch-hadoop-2.0.0 does seem to work with jetty - the authentication plugin. I got errors when I enable
    jetty and set 'es.nodes' = 'superuser:admin@192.168.128.1'
  4. I could not pipe data from Hive to ElasticSearch either.

--ISSUE 1:
--load data to ES
POST: http://localhost:9200/cars/transactions/_bulk
{ "index": {}}
{ "price" : 30000, "color" : "green", "make" : "ford", "sold" : "2014-05-18" }
{ "index": {}}
{ "price" : 15000, "color" : "blue", "make" : "toyota", "sold" : "2014-07-02" }
{ "index": {}}
{ "price" : 12000, "color" : "green", "make" : "toyota", "sold" : "2014-08-19" }
{ "index": {}}
{ "price" : 20000, "color" : "red", "make" : "honda", "sold" : "2014-11-05" }
{ "index": {}}
{ "price" : 80000, "color" : "red", "make" : "bmw", "sold" : "2014-01-01" }
{ "index": {}}
{ "price" : 25000, "color" : "blue", "make" : "ford", "sold" : "2014-02-12" }

CREATE EXTERNAL TABLE cars2 (color STRING, make STRING, price BIGINT, sold TIMESTAMP)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.resource' = 'cars/transactions',
'es.nodes' = '192.168.128.1', 'es.port'='9200');

HIVE: select * from cars2;
14 rows returned.

color make price sold
0 red honda 20000 2014-11-05 00:00:00.0
1 red honda 10000 2014-10-28 00:00:00.0
2 green ford 30000 2014-05-18 00:00:00.0
3 green toyota 12000 2014-08-19 00:00:00.0
4 blue ford 25000 2014-02-12 00:00:00.0
5 blue toyota 15000 2014-07-02 00:00:00.0
6 red bmw 80000 2014-01-01 00:00:00.0
7 red honda 10000 2014-10-28 00:00:00.0
8 blue toyota 15000 2014-07-02 00:00:00.0
9 red honda 20000 2014-11-05 00:00:00.0
10 green ford 30000 2014-05-18 00:00:00.0
11 green toyota 12000 2014-08-19 00:00:00.0
12 red honda 20000 2014-11-05 00:00:00.0
13 red honda 20000 2014-11-05 00:00:00.0
14 red bmw 80000 2014-01-01 00:00:00.0

It looks like you are adding data to localhost:9200 but querying on 192.168.128.1:9200 - most likely they are different, hence
the different data set. To double check, do a query/count through curl on ES and then check the data through Hive - that's what we do in our tests.

ISSUE2:

HIVE: select count(*) from cars2;

Your query has the following error(s):
Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask

Again since you are querying a different host it's hard to tell what's the issue. count(*) works in our tests but I've seen cases where count fails when dealing the newly introduced types (like timestamp). You can use count(1) as an alternative which should work just fine.

--ISSUE 4:

CREATE EXTERNAL TABLE test1 (
description STRING)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.host' = '192.168.128.1', 'es.port'='9200', 'es.resource' = 'test1');

INSERT OVERWRITE TABLE test1 select description from sample_07;

Your query has the following error(s):

Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask

That is because you have an invalid table definition; the resource needs to point to a "index/type" not just an index - if you look deep into the Hive exception, you should be able to see the actual validation message. Since Hive executes things lazily and on the server side, there's no other way of reporting the error to the user...

Hope this helps,

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
elasticsearch+unsubscribe@googlegroups.com mailto:elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/8c642665-424a-48be-bc5d-8625b94243c0%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/8c642665-424a-48be-bc5d-8625b94243c0%40googlegroups.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout.

--
Costin

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/m-Z1R7LFPRo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/539B3BFF.5000009%40gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8B9CC0C9-3117-4DB7-88D3-E65ABDDB0E51%40gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #5