[hadoop] Getting elasticsearch-hadoop working with Shark


(Max Lang) #1

I set everything up using this
guide: https://github.com/amplab/shark/wiki/Running-Shark-on-EC2 on an ec2
cluster. I've copied the elasticsearch-hadoop jars into the hive lib
directory and I have elasticsearch running on localhost:9200. I'm running
shark in a screen session with --service screenserver and connecting to it
at the same time using shark -h localhost.

Unfortunately, when I attempt to write data into elasticsearch, it fails.
Here's an example:

[localhost:10000] shark> CREATE EXTERNAL TABLE wiki (id BIGINT, title STRING
, last_modified STRING, xml STRING, text STRING) ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t' LOCATION 's3n://spark-data/wikipedia-sample/';
Time taken (including network latency): 0.159 seconds
14/02/19 01:23:33 INFO CliDriver: Time taken (including network latency):
0.159 seconds

[localhost:10000] shark> SELECT title FROM wiki LIMIT 1;
Alpokalja
Time taken (including network latency): 2.23 seconds
14/02/19 01:23:48 INFO CliDriver: Time taken (including network latency):
2.23 seconds

[localhost:10000] shark> CREATE EXTERNAL TABLE es_wiki (id BIGINT, title
STRING, last_modified STRING, xml STRING, text STRING) STORED BY
'org.elasticsearch.hadoop.hive.EsStorageHandler' TBLPROPERTIES('es.resource'
= 'wikipedia/article');
Time taken (including network latency): 0.061 seconds
14/02/19 01:33:51 INFO CliDriver: Time taken (including network latency):
0.061 seconds

[localhost:10000] shark> INSERT OVERWRITE TABLE es_wiki SELECT w.id, w.title
, w.last_modified, w.xml, w.text FROM wiki w;
[Hive Error]: Query returned non-zero code: 9, cause: FAILED: Execution
Error, return code -101 from shark.execution.SparkTask
Time taken (including network latency): 3.575 seconds
14/02/19 01:34:42 INFO CliDriver: Time taken (including network latency):
3.575 seconds

The stack trace looks like this:

org.apache.hadoop.hive.ql.metadata.HiveException
(org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: Out
of nodes and retries; caught exception)

org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:602)
shark.execution.FileSinkOperator$$anonfun$processPartition$1.apply(FileSinkOperator.scala:84)
shark.execution.FileSinkOperator$$anonfun$processPartition$1.apply(FileSinkOperator.scala:81)
scala.collection.Iterator$class.foreach(Iterator.scala:772)
scala.collection.Iterator$$anon$19.foreach(Iterator.scala:399)
shark.execution.FileSinkOperator.processPartition(FileSinkOperator.scala:81)
shark.execution.FileSinkOperator$.writeFiles$1(FileSinkOperator.scala:207)
shark.execution.FileSinkOperator$$anonfun$executeProcessFileSinkPartition$1.apply(FileSinkOperator.scala:211)
shark.execution.FileSinkOperator$$anonfun$executeProcessFileSinkPartition$1.apply(FileSinkOperator.scala:211)
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:107)
org.apache.spark.scheduler.Task.run(Task.scala:53)
org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:215)
org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:50)
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:182)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:744
I should be using Hive 0.9.0, shark 0.8.1, elasticsearch 1.0.0, Hadoop
1.0.4, and java 1.7.0_51
Based on my cursory look at the hadoop and elasticsearch-hadoop sources, it
looks like hive is just rethrowing an IOException it's getting from Spark,
and elasticsearch-hadoop is just hitting those exceptions.
I suppose my questions are: Does this look like an issue with my
ES/elasticsearch-hadoop config? And has anyone gotten elasticsearch working
with Spark/Shark?
Any ideas/insights are appreciated.
Thanks,Max

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9486faff-3eaf-4344-8931-3121bbc5d9c7%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Costin Leau) #2

The error indicates a network error - namely es-hadoop cannot connect to Elasticsearch on the default (localhost:9200)
HTTP port. Can you double check whether that's indeed the case (using curl or even telnet on that port) - maybe the
firewall prevents any connections to be made...
Also you could try using the latest Hive, 0.12 and a more recent Hadoop such as 1.1.2 or 1.2.1.

Additionally, can you enable TRACE logging in your job on es-hadoop packages org.elasticsearch.hadoop.rest and
org.elasticsearch.hadoop.mr packages and report back ?

Thanks,

On 19/02/2014 4:03 AM, Max Lang wrote:

I set everything up using this guide: https://github.com/amplab/shark/wiki/Running-Shark-on-EC2 on an ec2 cluster. I've
copied the elasticsearch-hadoop jars into the hive lib directory and I have elasticsearch running on localhost:9200. I'm
running shark in a screen session with --service screenserver and connecting to it at the same time using shark -h
localhost.

Unfortunately, when I attempt to write data into elasticsearch, it fails. Here's an example:

|
[localhost:10000]shark>CREATE EXTERNAL TABLE wiki (id BIGINT,title STRING,last_modified STRING,xml STRING,text
STRING)ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'LOCATION 's3n://spark-data/wikipedia-sample/';
Timetaken (including network latency):0.159seconds
14/02/1901:23:33INFO CliDriver:Timetaken (including network latency):0.159seconds

[localhost:10000]shark>SELECT title FROM wiki LIMIT 1;
Alpokalja
Timetaken (including network latency):2.23seconds
14/02/1901:23:48INFO CliDriver:Timetaken (including network latency):2.23seconds

[localhost:10000]shark>CREATE EXTERNAL TABLE es_wiki (id BIGINT,title STRING,last_modified STRING,xml STRING,text
STRING)STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'TBLPROPERTIES('es.resource'='wikipedia/article');
Timetaken (including network latency):0.061seconds
14/02/1901:33:51INFO CliDriver:Timetaken (including network latency):0.061seconds

[localhost:10000]shark>INSERT OVERWRITE TABLE es_wiki SELECT w.id,w.title,w.last_modified,w.xml,w.text FROM wiki w;
[HiveError]:Queryreturned non-zero code:9,cause:FAILED:ExecutionError,returncode -101fromshark.execution.SparkTask
Timetaken (including network latency):3.575seconds
14/02/1901:34:42INFO CliDriver:Timetaken (including network latency):3.575seconds
|

The stack trace looks like this:

org.apache.hadoop.hive.ql.metadata.HiveException (org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException:
Out of nodes and retries; caught exception)

org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:602)shark.execution.FileSinkOperator$$anonfun$processPartition$1.apply(FileSinkOperator.scala:84)shark.execution.FileSinkOperator$$anonfun$processPartition$1.apply(FileSinkOperator.scala:81)scala.collection.Iterator$class.foreach(Iterator.scala:772)scala.collection.Iterator$$anon$19.foreach(Iterator.scala:399)shark.execution.FileSinkOperator.processPartition(FileSinkOperator.scala:81)shark.execution.FileSinkOperator$.writeFiles$1(FileSinkOperator.scala:207)shark.execution.FileSinkOperator$$anonfun$executeProcessFileSinkPartition$1.apply(FileSinkOperator.scala:211)shark.execution.FileSinkOperator$$anonfun$executeProcessFileSinkPartition$1.apply(FileSinkOperator.scala:211)org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:107)org.apache.spark.scheduler.Task.run(Task.scala:53)org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:215)org.apache.spark.deploy.Sp
arkHadoopUtil.runAsUser(SparkHadoopUtil.scala:50)org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:182)java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)java.lang.Thread.run(Thread.java:744
I should be using Hive 0.9.0, shark 0.8.1, elasticsearch 1.0.0, Hadoop 1.0.4, and java 1.7.0_51
Based on my cursory look at the hadoop and elasticsearch-hadoop sources, it looks like hive is just rethrowing an
IOException it's getting from Spark, and elasticsearch-hadoop is just hitting those exceptions.
I suppose my questions are: Does this look like an issue with my ES/elasticsearch-hadoop config? And has anyone gotten
elasticsearch working with Spark/Shark?
Any ideas/insights are appreciated.
Thanks,Max

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/9486faff-3eaf-4344-8931-3121bbc5d9c7%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/53044C46.70807%40gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Max Lang) #3

Hey Costin,

Thanks for the swift reply. I abandoned EC2 to take that out of the
equation and managed to get everything working locally using the latest
version of everything (though I realized just now I'm still on hive 0.9).
I'm guessing you're right about some port connection issue because I
definitely had ES running on that machine.

I changed hive-log4j.properties and added
#custom logging levels
#log4j.logger.xxx=DEBUG
log4j.logger.org.elasticsearch.hadoop.rest=TRACE
log4j.logger.org.elasticsearch.hadoop.mr=TRACE

But I didn't see any trace logging. Hopefully I can get it working on EC2
without issue, but, for the future, is this the correct way to set TRACE
logging?
Oh and, for reference, I tried running without ES up and I got the
following, exceptions:

2014-02-19 13:46:08,803 ERROR shark.SharkDriver
(Logging.scala:logError(64)) - FAILED: Hive Internal Error:
java.lang.IllegalStateException(Cannot discover Elasticsearch version)
java.lang.IllegalStateException: Cannot discover Elasticsearch version
at
org.elasticsearch.hadoop.hive.EsStorageHandler.init(EsStorageHandler.java:101)

at
org.elasticsearch.hadoop.hive.EsStorageHandler.configureOutputJobProperties(EsStorageHandler.java:83)

at
org.apache.hadoop.hive.ql.plan.PlanUtils.configureJobPropertiesForStorageHandler(PlanUtils.java:706)

at
org.apache.hadoop.hive.ql.plan.PlanUtils.configureOutputJobPropertiesForStorageHandler(PlanUtils.java:675)

at
org.apache.hadoop.hive.ql.exec.FileSinkOperator.augmentPlan(FileSinkOperator.java:764)

at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.putOpInsertMap(SemanticAnalyzer.java:1518)

at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:4337)

at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:6207)

at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:6138)

at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:6764)

at
shark.parse.SharkSemanticAnalyzer.analyzeInternal(SharkSemanticAnalyzer.scala:149)

at
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:244)

at shark.SharkDriver.compile(SharkDriver.scala:215)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:336)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:895)
at shark.SharkCliDriver.processCmd(SharkCliDriver.scala:324)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406)
at shark.SharkCliDriver$.main(SharkCliDriver.scala:232)
at shark.SharkCliDriver.main(SharkCliDriver.scala)
Caused by: java.io.IOException: Out of nodes and retries; caught exception
at
org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:81)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:221)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:205)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:209)
at org.elasticsearch.hadoop.rest.RestClient.get(RestClient.java:103)
at org.elasticsearch.hadoop.rest.RestClient.esVersion(RestClient.java:274)
at
org.elasticsearch.hadoop.rest.InitializationUtils.discoverEsVersion(InitializationUtils.java:84)

at
org.elasticsearch.hadoop.hive.EsStorageHandler.init(EsStorageHandler.java:99)

... 18 more
Caused by: java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)

at
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)

at
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:391)
at java.net.Socket.connect(Socket.java:579)
at java.net.Socket.connect(Socket.java:528)
at java.net.Socket.(Socket.java:425)
at java.net.Socket.(Socket.java:280)
at
org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:80)

at
org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:122)

at
org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:707)
at
org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:387)

at
org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)

at
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
at
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
at
org.elasticsearch.hadoop.rest.commonshttp.CommonsHttpTransport.execute(CommonsHttpTransport.java:160)

at
org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:74)
... 25 more

Let me know if there's anything in particular you'd like me to try on EC2.

(For posterity, the versions I used were: hadoop 2.2.0, hive 0.9.0, shark
8.1, spark 8.1, es-hadoop 1.3.0.M2, java 1.7.0_15, scala 2.9.3,
elasticsearch 1.0.0)

Thanks again,
Max

On Tuesday, February 18, 2014 10:16:38 PM UTC-8, Costin Leau wrote:

The error indicates a network error - namely es-hadoop cannot connect to
Elasticsearch on the default (localhost:9200)
HTTP port. Can you double check whether that's indeed the case (using curl
or even telnet on that port) - maybe the
firewall prevents any connections to be made...
Also you could try using the latest Hive, 0.12 and a more recent Hadoop
such as 1.1.2 or 1.2.1.

Additionally, can you enable TRACE logging in your job on es-hadoop
packages org.elasticsearch.hadoop.rest and
org.elasticsearch.hadoop.mr packages and report back ?

Thanks,

On 19/02/2014 4:03 AM, Max Lang wrote:

I set everything up using this guide:
https://github.com/amplab/shark/wiki/Running-Shark-on-EC2 on an ec2
cluster. I've
copied the elasticsearch-hadoop jars into the hive lib directory and I
have elasticsearch running on localhost:9200. I'm
running shark in a screen session with --service screenserver and
connecting to it at the same time using shark -h
localhost.

Unfortunately, when I attempt to write data into elasticsearch, it
fails. Here's an example:

|
[localhost:10000]shark>CREATE EXTERNAL TABLE wiki (id BIGINT,title
STRING,last_modified STRING,xml STRING,text
STRING)ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'LOCATION
's3n://spark-data/wikipedia-sample/';
Timetaken (including network latency):0.159seconds
14/02/1901:23:33INFO CliDriver:Timetaken (including network
latency):0.159seconds

[localhost:10000]shark>SELECT title FROM wiki LIMIT 1;
Alpokalja
Timetaken (including network latency):2.23seconds
14/02/1901:23:48INFO CliDriver:Timetaken (including network
latency):2.23seconds

[localhost:10000]shark>CREATE EXTERNAL TABLE es_wiki (id BIGINT,title
STRING,last_modified STRING,xml STRING,text
STRING)STORED BY
'org.elasticsearch.hadoop.hive.EsStorageHandler'TBLPROPERTIES('es.resource'='wikipedia/article');

Timetaken (including network latency):0.061seconds
14/02/1901:33:51INFO CliDriver:Timetaken (including network
latency):0.061seconds

[localhost:10000]shark>INSERT OVERWRITE TABLE es_wiki SELECT w.id,w.title,w.last_modified,w.xml,w.text
FROM wiki w;
[HiveError]:Queryreturned non-zero
code:9,cause:FAILED:ExecutionError,returncode
-101fromshark.execution.SparkTask
Timetaken (including network latency):3.575seconds
14/02/1901:34:42INFO CliDriver:Timetaken (including network
latency):3.575seconds
|

The stack trace looks like this:

org.apache.hadoop.hive.ql.metadata.HiveException
(org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException:
Out of nodes and retries; caught exception)

org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:602)shark.execution.FileSinkOperator$$anonfun$processPartition$1.apply(FileSinkOperator.scala:84)shark.execution.FileSinkOperator$$anonfun$processPartition$1.apply(FileSinkOperator.scala:81)scala.collection.Iterator$class.foreach(Iterator.scala:772)scala.collection.Iterator$$anon$19.foreach(Iterator.scala:399)shark.execution.FileSinkOperator.processPartition(FileSinkOperator.scala:81)shark.execution.FileSinkOperator$.writeFiles$1(FileSinkOperator.scala:207)shark.execution.FileSinkOperator$$anonfun$executeProcessFileSinkPartition$1.apply(FileSinkOperator.scala:211)shark.execution.FileSinkOperator$$anonfun$executeProcessFileSinkPartition$1.apply(FileSinkOperator.scala:211)org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:107)org.apache.spark.scheduler.Task.run(Task.scala:53)org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:215)org.apache.spark.deploy.Sp

arkHadoopUtil.runAsUser(SparkHadoopUtil.scala:50)org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:182)java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)java.lang.Thread.run(Thread.java:744

I should be using Hive 0.9.0, shark 0.8.1, elasticsearch 1.0.0, Hadoop
1.0.4, and java 1.7.0_51
Based on my cursory look at the hadoop and elasticsearch-hadoop sources,
it looks like hive is just rethrowing an
IOException it's getting from Spark, and elasticsearch-hadoop is just
hitting those exceptions.
I suppose my questions are: Does this look like an issue with my
ES/elasticsearch-hadoop config? And has anyone gotten
elasticsearch working with Spark/Shark?
Any ideas/insights are appreciated.
Thanks,Max

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to
elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit

https://groups.google.com/d/msgid/elasticsearch/9486faff-3eaf-4344-8931-3121bbc5d9c7%40googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/86187c3a-0974-4d10-9689-e83da788c04a%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Costin Leau) #4

Hi,

Setting logging in Hive/Hadoop can be tricky since the log4j needs to be picked up by the running JVM otherwise you
won't see anything.
Take a look at this link on how to tell Hive to use your logging settings [1].

For the next release, we might introduce dedicated exceptions for the simple fact that some libraries, like Hive,
swallow the stack trace and it's unclear what the issue is which makes the exception (IllegalStateException) ambiguous.

Let me know how it goes and whether you will encounter any issues with Shark. Or if you don't :slight_smile:

Thanks!

[1] https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-ErrorLogs

On 20/02/2014 12:02 AM, Max Lang wrote:

Hey Costin,

Thanks for the swift reply. I abandoned EC2 to take that out of the equation and managed to get everything working
locally using the latest version of everything (though I realized just now I'm still on hive 0.9). I'm guessing you're
right about some port connection issue because I definitely had ES running on that machine.

I changed hive-log4j.properties and added
|
#custom logging levels
#log4j.logger.xxx=DEBUG
log4j.logger.org.elasticsearch.hadoop.rest=TRACE
log4j.logger.org.elasticsearch.hadoop.mr=TRACE
|

But I didn't see any trace logging. Hopefully I can get it working on EC2 without issue, but, for the future, is this
the correct way to set TRACE logging?

Oh and, for reference, I tried running without ES up and I got the following, exceptions:

2014-02-19 13:46:08,803 ERROR shark.SharkDriver (Logging.scala:logError(64)) - FAILED: Hive Internal Error:
java.lang.IllegalStateException(Cannot discover Elasticsearch version)
java.lang.IllegalStateException: Cannot discover Elasticsearch version
at org.elasticsearch.hadoop.hive.EsStorageHandler.init(EsStorageHandler.java:101)
at org.elasticsearch.hadoop.hive.EsStorageHandler.configureOutputJobProperties(EsStorageHandler.java:83)
at org.apache.hadoop.hive.ql.plan.PlanUtils.configureJobPropertiesForStorageHandler(PlanUtils.java:706)
at org.apache.hadoop.hive.ql.plan.PlanUtils.configureOutputJobPropertiesForStorageHandler(PlanUtils.java:675)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.augmentPlan(FileSinkOperator.java:764)
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.putOpInsertMap(SemanticAnalyzer.java:1518)
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:4337)
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:6207)
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:6138)
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:6764)
at shark.parse.SharkSemanticAnalyzer.analyzeInternal(SharkSemanticAnalyzer.scala:149)
at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:244)
at shark.SharkDriver.compile(SharkDriver.scala:215)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:336)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:895)
at shark.SharkCliDriver.processCmd(SharkCliDriver.scala:324)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406)
at shark.SharkCliDriver$.main(SharkCliDriver.scala:232)
at shark.SharkCliDriver.main(SharkCliDriver.scala)
Caused by: java.io.IOException: Out of nodes and retries; caught exception
at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:81)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:221)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:205)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:209)
at org.elasticsearch.hadoop.rest.RestClient.get(RestClient.java:103)
at org.elasticsearch.hadoop.rest.RestClient.esVersion(RestClient.java:274)
at org.elasticsearch.hadoop.rest.InitializationUtils.discoverEsVersion(InitializationUtils.java:84)
at org.elasticsearch.hadoop.hive.EsStorageHandler.init(EsStorageHandler.java:99)
... 18 more
Caused by: java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:391)
at java.net.Socket.connect(Socket.java:579)
at java.net.Socket.connect(Socket.java:528)
at java.net.Socket.(Socket.java:425)
at java.net.Socket.(Socket.java:280)
at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:80)
at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:122)
at org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:707)
at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:387)
at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
at org.elasticsearch.hadoop.rest.commonshttp.CommonsHttpTransport.execute(CommonsHttpTransport.java:160)
at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:74)
... 25 more

Let me know if there's anything in particular you'd like me to try on EC2.

(For posterity, the versions I used were: hadoop 2.2.0, hive 0.9.0, shark 8.1, spark 8.1, es-hadoop 1.3.0.M2, java
1.7.0_15, scala 2.9.3, elasticsearch 1.0.0)

Thanks again,
Max

On Tuesday, February 18, 2014 10:16:38 PM UTC-8, Costin Leau wrote:

The error indicates a network error - namely es-hadoop cannot connect to Elasticsearch on the default (localhost:9200)
HTTP port. Can you double check whether that's indeed the case (using curl or even telnet on that port) - maybe the
firewall prevents any connections to be made...
Also you could try using the latest Hive, 0.12 and a more recent Hadoop such as 1.1.2 or 1.2.1.

Additionally, can you enable TRACE logging in your job on es-hadoop packages org.elasticsearch.hadoop.rest and
org.elasticsearch.hadoop.mr <http://org.elasticsearch.hadoop.mr> packages and report back ?

Thanks,

On 19/02/2014 4:03 AM, Max Lang wrote:
> I set everything up using this guide:https://github.com/amplab/shark/wiki/Running-Shark-on-EC2
<https://github.com/amplab/shark/wiki/Running-Shark-on-EC2> on an ec2 cluster. I've
> copied the elasticsearch-hadoop jars into the hive lib directory and I have elasticsearch running on localhost:9200. I'm
> running shark in a screen session with --service screenserver and connecting to it at the same time using shark -h
> localhost.
>
> Unfortunately, when I attempt to write data into elasticsearch, it fails. Here's an example:
>
> |
> [localhost:10000]shark>CREATE EXTERNAL TABLE wiki (id BIGINT,title STRING,last_modified STRING,xml STRING,text
> STRING)ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'LOCATION 's3n://spark-data/wikipedia-sample/';
> Timetaken (including network latency):0.159seconds
> 14/02/1901:23:33INFO CliDriver:Timetaken (including network latency):0.159seconds
>
> [localhost:10000]shark>SELECT title FROM wiki LIMIT 1;
> Alpokalja
> Timetaken (including network latency):2.23seconds
> 14/02/1901:23:48INFO CliDriver:Timetaken (including network latency):2.23seconds
>
> [localhost:10000]shark>CREATE EXTERNAL TABLE es_wiki (id BIGINT,title STRING,last_modified STRING,xml STRING,text
> STRING)STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'TBLPROPERTIES('es.resource'='wikipedia/article');
> Timetaken (including network latency):0.061seconds
> 14/02/1901:33:51INFO CliDriver:Timetaken (including network latency):0.061seconds
>
> [localhost:10000]shark>INSERT OVERWRITE TABLE es_wiki SELECTw.id <http://w.id>,w.title,w.last_modified,w.xml,w.text FROM wiki w;
> [HiveError]:Queryreturned non-zero code:9,cause:FAILED:ExecutionError,returncode -101fromshark.execution.SparkTask
> Timetaken (including network latency):3.575seconds
> 14/02/1901:34:42INFO CliDriver:Timetaken (including network latency):3.575seconds
> |
>
> *The stack trace looks like this:*
>
> org.apache.hadoop.hive.ql.metadata.HiveException (org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException:
> Out of nodes and retries; caught exception)
>
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:602)shark.execution.FileSinkOperator$$anonfun$processPartition$1.apply(FileSinkOperator.scala:84)shark.execution.FileSinkOperator$$anonfun$processPartition$1.apply(FileSinkOperator.scala:81)scala.collection.Iterator$class.foreach(Iterator.scala:772)scala.collection.Iterator$$anon$19.foreach(Iterator.scala:399)shark.execution.FileSinkOperator.processPartition(FileSinkOperator.scala:81)shark.execution.FileSinkOperator$.writeFiles$1(FileSinkOperator.scala:207)shark.execution.FileSinkOperator$$anonfun$executeProcessFileSinkPartition$1.apply(FileSinkOperator.scala:211)shark.execution.FileSinkOperator$$anonfun$executeProcessFileSinkPartition$1.apply(FileSinkOperator.scala:211)org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:107)org.apache.spark.scheduler.Task.run(Task.scala:53)org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:215)org.apache.spark.dep

loy.Sp

arkHadoopUtil.runAsUser(SparkHadoopUtil.scala:50)org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:182)java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)java.lang.Thread.run(Thread.java:744

> I should be using Hive 0.9.0, shark 0.8.1, elasticsearch 1.0.0, Hadoop 1.0.4, and java 1.7.0_51
> Based on my cursory look at the hadoop and elasticsearch-hadoop sources, it looks like hive is just rethrowing an
> IOException it's getting from Spark, and elasticsearch-hadoop is just hitting those exceptions.
> I suppose my questions are: Does this look like an issue with my ES/elasticsearch-hadoop config? And has anyone gotten
> elasticsearch working with Spark/Shark?
> Any ideas/insights are appreciated.
> Thanks,Max
>
> --
> You received this message because you are subscribed to the Google Groups "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
>elasticsearc...@googlegroups.com <javascript:>.
> To view this discussion on the web visit
>https://groups.google.com/d/msgid/elasticsearch/9486faff-3eaf-4344-8931-3121bbc5d9c7%40googlegroups.com
<https://groups.google.com/d/msgid/elasticsearch/9486faff-3eaf-4344-8931-3121bbc5d9c7%40googlegroups.com>.
> For more options, visithttps://groups.google.com/groups/opt_out <https://groups.google.com/groups/opt_out>.

--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/86187c3a-0974-4d10-9689-e83da788c04a%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/530531BC.80807%40gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Max Lang) #5

I managed to get it working on ec2 without issue this time. I'd say the
biggest difference was that this time I set up a dedicated ES machine. Is
it possible that, because I was using a cluster with slaves, when I used
"localhost" the slaves couldn't find the ES instance running on the master?
Or do all the requests go through the master?

On Wednesday, February 19, 2014 2:35:40 PM UTC-8, Costin Leau wrote:

Hi,

Setting logging in Hive/Hadoop can be tricky since the log4j needs to be
picked up by the running JVM otherwise you
won't see anything.
Take a look at this link on how to tell Hive to use your logging settings
[1].

For the next release, we might introduce dedicated exceptions for the
simple fact that some libraries, like Hive,
swallow the stack trace and it's unclear what the issue is which makes the
exception (IllegalStateException) ambiguous.

Let me know how it goes and whether you will encounter any issues with
Shark. Or if you don't :slight_smile:

Thanks!

[1]
https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-ErrorLogs

On 20/02/2014 12:02 AM, Max Lang wrote:

Hey Costin,

Thanks for the swift reply. I abandoned EC2 to take that out of the
equation and managed to get everything working
locally using the latest version of everything (though I realized just
now I'm still on hive 0.9). I'm guessing you're
right about some port connection issue because I definitely had ES
running on that machine.

I changed hive-log4j.properties and added
|
#custom logging levels
#log4j.logger.xxx=DEBUG
log4j.logger.org.elasticsearch.hadoop.rest=TRACE
log4j.logger.org.elasticsearch.hadoop.mr=TRACE
|

But I didn't see any trace logging. Hopefully I can get it working on
EC2 without issue, but, for the future, is this
the correct way to set TRACE logging?

Oh and, for reference, I tried running without ES up and I got the
following, exceptions:

2014-02-19 13:46:08,803 ERROR shark.SharkDriver
(Logging.scala:logError(64)) - FAILED: Hive Internal Error:
java.lang.IllegalStateException(Cannot discover Elasticsearch version)
java.lang.IllegalStateException: Cannot discover Elasticsearch version
at
org.elasticsearch.hadoop.hive.EsStorageHandler.init(EsStorageHandler.java:101)

at
org.elasticsearch.hadoop.hive.EsStorageHandler.configureOutputJobProperties(EsStorageHandler.java:83)

at
org.apache.hadoop.hive.ql.plan.PlanUtils.configureJobPropertiesForStorageHandler(PlanUtils.java:706)

at
org.apache.hadoop.hive.ql.plan.PlanUtils.configureOutputJobPropertiesForStorageHandler(PlanUtils.java:675)

at
org.apache.hadoop.hive.ql.exec.FileSinkOperator.augmentPlan(FileSinkOperator.java:764)

at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.putOpInsertMap(SemanticAnalyzer.java:1518)

at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:4337)

at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:6207)

at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:6138)

at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:6764)

at
shark.parse.SharkSemanticAnalyzer.analyzeInternal(SharkSemanticAnalyzer.scala:149)

at
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:244)

at shark.SharkDriver.compile(SharkDriver.scala:215)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:336)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:895)
at shark.SharkCliDriver.processCmd(SharkCliDriver.scala:324)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406)
at shark.SharkCliDriver$.main(SharkCliDriver.scala:232)
at shark.SharkCliDriver.main(SharkCliDriver.scala)
Caused by: java.io.IOException: Out of nodes and retries; caught
exception
at
org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:81)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:221)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:205)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:209)
at org.elasticsearch.hadoop.rest.RestClient.get(RestClient.java:103)
at
org.elasticsearch.hadoop.rest.RestClient.esVersion(RestClient.java:274)
at
org.elasticsearch.hadoop.rest.InitializationUtils.discoverEsVersion(InitializationUtils.java:84)

at
org.elasticsearch.hadoop.hive.EsStorageHandler.init(EsStorageHandler.java:99)

... 18 more
Caused by: java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)

at
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)

at
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:391)
at java.net.Socket.connect(Socket.java:579)
at java.net.Socket.connect(Socket.java:528)
at java.net.Socket.(Socket.java:425)
at java.net.Socket.(Socket.java:280)
at
org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:80)

at
org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:122)

at
org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:707)
at
org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:387)

at
org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)

at
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
at
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
at
org.elasticsearch.hadoop.rest.commonshttp.CommonsHttpTransport.execute(CommonsHttpTransport.java:160)

at
org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:74)
... 25 more

Let me know if there's anything in particular you'd like me to try on
EC2.

(For posterity, the versions I used were: hadoop 2.2.0, hive 0.9.0,
shark 8.1, spark 8.1, es-hadoop 1.3.0.M2, java
1.7.0_15, scala 2.9.3, elasticsearch 1.0.0)

Thanks again,
Max

On Tuesday, February 18, 2014 10:16:38 PM UTC-8, Costin Leau wrote:

The error indicates a network error - namely es-hadoop cannot 

connect to Elasticsearch on the default (localhost:9200)

HTTP port. Can you double check whether that's indeed the case 

(using curl or even telnet on that port) - maybe the

firewall prevents any connections to be made... 
Also you could try using the latest Hive, 0.12 and a more recent 

Hadoop such as 1.1.2 or 1.2.1.

Additionally, can you enable TRACE logging in your job on es-hadoop 

packages org.elasticsearch.hadoop.rest and

org.elasticsearch.hadoop.mr <http://org.elasticsearch.hadoop.mr> 

packages and report back ?

Thanks, 

On 19/02/2014 4:03 AM, Max Lang wrote: 
> I set everything up using this guide:

https://github.com/amplab/shark/wiki/Running-Shark-on-EC2

<https://github.com/amplab/shark/wiki/Running-Shark-on-EC2> on an 

ec2 cluster. I've

> copied the elasticsearch-hadoop jars into the hive lib directory 

and I have elasticsearch running on localhost:9200. I'm

> running shark in a screen session with --service screenserver and 

connecting to it at the same time using shark -h

> localhost. 
> 
> Unfortunately, when I attempt to write data into elasticsearch, it 

fails. Here's an example:

> 
> | 
> [localhost:10000]shark>CREATE EXTERNAL TABLE wiki (id BIGINT,title 

STRING,last_modified STRING,xml STRING,text

> STRING)ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'LOCATION 

's3n://spark-data/wikipedia-sample/';

> Timetaken (including network latency):0.159seconds 
> 14/02/1901:23:33INFO CliDriver:Timetaken (including network 

latency):0.159seconds

> 
> [localhost:10000]shark>SELECT title FROM wiki LIMIT 1; 
> Alpokalja 
> Timetaken (including network latency):2.23seconds 
> 14/02/1901:23:48INFO CliDriver:Timetaken (including network 

latency):2.23seconds

> 
> [localhost:10000]shark>CREATE EXTERNAL TABLE es_wiki (id 

BIGINT,title STRING,last_modified STRING,xml STRING,text

> STRING)STORED BY 

'org.elasticsearch.hadoop.hive.EsStorageHandler'TBLPROPERTIES('es.resource'='wikipedia/article');

> Timetaken (including network latency):0.061seconds 
> 14/02/1901:33:51INFO CliDriver:Timetaken (including network 

latency):0.061seconds

> 
> [localhost:10000]shark>INSERT OVERWRITE TABLE es_wiki SELECTw.id <

http://w.id>,w.title,w.last_modified,w.xml,w.text FROM wiki w;

> [HiveError]:Queryreturned non-zero 

code:9,cause:FAILED:ExecutionError,returncode
-101fromshark.execution.SparkTask

> Timetaken (including network latency):3.575seconds 
> 14/02/1901:34:42INFO CliDriver:Timetaken (including network 

latency):3.575seconds

> | 
> 
> *The stack trace looks like this:* 
> 
> org.apache.hadoop.hive.ql.metadata.HiveException 

(org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException:

> Out of nodes and retries; caught exception) 
> 
> 

org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:602)shark.execution.FileSinkOperator$$anonfun$processPartition$1.apply(FileSinkOperator.scala:84)shark.execution.FileSinkOperator$$anonfun$processPartition$1.apply(FileSinkOperator.scala:81)scala.collection.Iterator$class.foreach(Iterator.scala:772)scala.collection.Iterator$$anon$19.foreach(Iterator.scala:399)shark.execution.FileSinkOperator.processPartition(FileSinkOperator.scala:81)shark.execution.FileSinkOperator$.writeFiles$1(FileSinkOperator.scala:207)shark.execution.FileSinkOperator$$anonfun$executeProcessFileSinkPartition$1.apply(FileSinkOperator.scala:211)shark.execution.FileSinkOperator$$anonfun$executeProcessFileSinkPartition$1.apply(FileSinkOperator.scala:211)org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:107)org.apache.spark.scheduler.Task.run(Task.scala:53)org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:215)org.apache.spark.dep

loy.Sp

arkHadoopUtil.runAsUser(SparkHadoopUtil.scala:50)org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:182)java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)java.lang.Thread.run(Thread.java:744

> I should be using Hive 0.9.0, shark 0.8.1, elasticsearch 1.0.0, 

Hadoop 1.0.4, and java 1.7.0_51

> Based on my cursory look at the hadoop and elasticsearch-hadoop 

sources, it looks like hive is just rethrowing an

> IOException it's getting from Spark, and elasticsearch-hadoop is 

just hitting those exceptions.

> I suppose my questions are: Does this look like an issue with my 

ES/elasticsearch-hadoop config? And has anyone gotten

> elasticsearch working with Spark/Shark? 
> Any ideas/insights are appreciated. 
> Thanks,Max 
> 
> -- 
> You received this message because you are subscribed to the Google 

Groups "elasticsearch" group.

> To unsubscribe from this group and stop receiving emails from it, 

send an email to

>elasticsearc...@googlegroups.com <javascript:>. 
> To view this discussion on the web visit 
>

https://groups.google.com/d/msgid/elasticsearch/9486faff-3eaf-4344-8931-3121bbc5d9c7%40googlegroups.com

<

https://groups.google.com/d/msgid/elasticsearch/9486faff-3eaf-4344-8931-3121bbc5d9c7%40googlegroups.com>.

> For more options, visithttps://groups.google.com/groups/opt_out <

https://groups.google.com/groups/opt_out>.

-- 
Costin 

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to
elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit

https://groups.google.com/d/msgid/elasticsearch/86187c3a-0974-4d10-9689-e83da788c04a%40googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e29e342d-de74-4ed6-93d4-875fc728c5a5%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Costin Leau) #6

Yeah, it might have been some sort of network configuration issue where services where running on different machines and
localhost pointed to a different location.

Either way, I'm glad to hear things have are moving forward.

Cheers,

On 22/02/2014 1:06 AM, Max Lang wrote:

I managed to get it working on ec2 without issue this time. I'd say the biggest difference was that this time I set up a
dedicated ES machine. Is it possible that, because I was using a cluster with slaves, when I used "localhost" the slaves
couldn't find the ES instance running on the master? Or do all the requests go through the master?

On Wednesday, February 19, 2014 2:35:40 PM UTC-8, Costin Leau wrote:

Hi,

Setting logging in Hive/Hadoop can be tricky since the log4j needs to be picked up by the running JVM otherwise you
won't see anything.
Take a look at this link on how to tell Hive to use your logging settings [1].

For the next release, we might introduce dedicated exceptions for the simple fact that some libraries, like Hive,
swallow the stack trace and it's unclear what the issue is which makes the exception (IllegalStateException) ambiguous.

Let me know how it goes and whether you will encounter any issues with Shark. Or if you don't :)

Thanks!

[1] https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-ErrorLogs
<https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-ErrorLogs>

On 20/02/2014 12:02 AM, Max Lang wrote:
> Hey Costin,
>
> Thanks for the swift reply. I abandoned EC2 to take that out of the equation and managed to get everything working
> locally using the latest version of everything (though I realized just now I'm still on hive 0.9). I'm guessing you're
> right about some port connection issue because I definitely had ES running on that machine.
>
> I changed hive-log4j.properties and added
> |
> #custom logging levels
> #log4j.logger.xxx=DEBUG
> log4j.logger.org.elasticsearch.hadoop.rest=TRACE
>log4j.logger.org.elasticsearch.hadoop.mr <http://log4j.logger.org.elasticsearch.hadoop.mr>=TRACE
> |
>
> But I didn't see any trace logging. Hopefully I can get it working on EC2 without issue, but, for the future, is this
> the correct way to set TRACE logging?
>
> Oh and, for reference, I tried running without ES up and I got the following, exceptions:
>
> 2014-02-19 13:46:08,803 ERROR shark.SharkDriver (Logging.scala:logError(64)) - FAILED: Hive Internal Error:
> java.lang.IllegalStateException(Cannot discover Elasticsearch version)
> java.lang.IllegalStateException: Cannot discover Elasticsearch version
> at org.elasticsearch.hadoop.hive.EsStorageHandler.init(EsStorageHandler.java:101)
> at org.elasticsearch.hadoop.hive.EsStorageHandler.configureOutputJobProperties(EsStorageHandler.java:83)
> at org.apache.hadoop.hive.ql.plan.PlanUtils.configureJobPropertiesForStorageHandler(PlanUtils.java:706)
> at org.apache.hadoop.hive.ql.plan.PlanUtils.configureOutputJobPropertiesForStorageHandler(PlanUtils.java:675)
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator.augmentPlan(FileSinkOperator.java:764)
> at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.putOpInsertMap(SemanticAnalyzer.java:1518)
> at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:4337)
> at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:6207)
> at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:6138)
> at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:6764)
> at shark.parse.SharkSemanticAnalyzer.analyzeInternal(SharkSemanticAnalyzer.scala:149)
> at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:244)
> at shark.SharkDriver.compile(SharkDriver.scala:215)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:336)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:895)
> at shark.SharkCliDriver.processCmd(SharkCliDriver.scala:324)
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406)
> at shark.SharkCliDriver$.main(SharkCliDriver.scala:232)
> at shark.SharkCliDriver.main(SharkCliDriver.scala)
> Caused by: java.io.IOException: Out of nodes and retries; caught exception
> at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:81)
> at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:221)
> at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:205)
> at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:209)
> at org.elasticsearch.hadoop.rest.RestClient.get(RestClient.java:103)
> at org.elasticsearch.hadoop.rest.RestClient.esVersion(RestClient.java:274)
> at org.elasticsearch.hadoop.rest.InitializationUtils.discoverEsVersion(InitializationUtils.java:84)
> at org.elasticsearch.hadoop.hive.EsStorageHandler.init(EsStorageHandler.java:99)
> ... 18 more
> Caused by: java.net.ConnectException: Connection refused
> at java.net.PlainSocketImpl.socketConnect(Native Method)
> at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
> at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
> at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:391)
> at java.net.Socket.connect(Socket.java:579)
> at java.net.Socket.connect(Socket.java:528)
> at java.net.Socket.<init>(Socket.java:425)
> at java.net.Socket.<init>(Socket.java:280)
> at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:80)
> at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:122)
> at org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:707)
> at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:387)
> at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
> at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
> at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
> at org.elasticsearch.hadoop.rest.commonshttp.CommonsHttpTransport.execute(CommonsHttpTransport.java:160)
> at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:74)
> ... 25 more
>
> Let me know if there's anything in particular you'd like me to try on EC2.
>
> (For posterity, the versions I used were: hadoop 2.2.0, hive 0.9.0, shark 8.1, spark 8.1, es-hadoop 1.3.0.M2, java
> 1.7.0_15, scala 2.9.3, elasticsearch 1.0.0)
>
> Thanks again,
> Max
>
> On Tuesday, February 18, 2014 10:16:38 PM UTC-8, Costin Leau wrote:
>
>     The error indicates a network error - namely es-hadoop cannot connect to Elasticsearch on the default (localhost:9200)
>     HTTP port. Can you double check whether that's indeed the case (using curl or even telnet on that port) - maybe the
>     firewall prevents any connections to be made...
>     Also you could try using the latest Hive, 0.12 and a more recent Hadoop such as 1.1.2 or 1.2.1.
>
>     Additionally, can you enable TRACE logging in your job on es-hadoop packages org.elasticsearch.hadoop.rest and
>org.elasticsearch.hadoop.mr <http://org.elasticsearch.hadoop.mr> <http://org.elasticsearch.hadoop.mr
<http://org.elasticsearch.hadoop.mr>> packages and report back ?
>
>     Thanks,
>
>     On 19/02/2014 4:03 AM, Max Lang wrote:
>     > I set everything up using this guide:https://github.com/amplab/shark/wiki/Running-Shark-on-EC2 <https://github.com/amplab/shark/wiki/Running-Shark-on-EC2>
>     <https://github.com/amplab/shark/wiki/Running-Shark-on-EC2
<https://github.com/amplab/shark/wiki/Running-Shark-on-EC2>> on an ec2 cluster. I've
>     > copied the elasticsearch-hadoop jars into the hive lib directory and I have elasticsearch running on localhost:9200. I'm
>     > running shark in a screen session with --service screenserver and connecting to it at the same time using shark -h
>     > localhost.
>     >
>     > Unfortunately, when I attempt to write data into elasticsearch, it fails. Here's an example:
>     >
>     > |
>     > [localhost:10000]shark>CREATE EXTERNAL TABLE wiki (id BIGINT,title STRING,last_modified STRING,xml STRING,text
>     > STRING)ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'LOCATION 's3n://spark-data/wikipedia-sample/';
>     > Timetaken (including network latency):0.159seconds
>     > 14/02/1901:23:33INFO CliDriver:Timetaken (including network latency):0.159seconds
>     >
>     > [localhost:10000]shark>SELECT title FROM wiki LIMIT 1;
>     > Alpokalja
>     > Timetaken (including network latency):2.23seconds
>     > 14/02/1901:23:48INFO CliDriver:Timetaken (including network latency):2.23seconds
>     >
>     > [localhost:10000]shark>CREATE EXTERNAL TABLE es_wiki (id BIGINT,title STRING,last_modified STRING,xml STRING,text
>     > STRING)STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'TBLPROPERTIES('es.resource'='wikipedia/article');
>     > Timetaken (including network latency):0.061seconds
>     > 14/02/1901:33:51INFO CliDriver:Timetaken (including network latency):0.061seconds
>     >
>     > [localhost:10000]shark>INSERT OVERWRITE TABLE es_wiki SELECTw.id <http://w.id>,w.title,w.last_modified,w.xml,w.text FROM wiki w;
>     > [HiveError]:Queryreturned non-zero code:9,cause:FAILED:ExecutionError,returncode -101fromshark.execution.SparkTask
>     > Timetaken (including network latency):3.575seconds
>     > 14/02/1901:34:42INFO CliDriver:Timetaken (including network latency):3.575seconds
>     > |
>     >
>     > *The stack trace looks like this:*
>     >
>     > org.apache.hadoop.hive.ql.metadata.HiveException (org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException:
>     > Out of nodes and retries; caught exception)
>     >
>     > org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:602)shark.execution.FileSinkOperator$$anonfun$processPartition$1.apply(FileSinkOperator.scala:84)shark.execution.FileSinkOperator$$anonfun$processPartition$1.apply(FileSinkOperator.scala:81)scala.collection.Iterator$class.foreach(Iterator.scala:772)scala.collection.Iterator$$anon$19.foreach(Iterator.scala:399)shark.execution.FileSinkOperator.processPartition(FileSinkOperator.scala:81)shark.execution.FileSinkOperator$.writeFiles$1(FileSinkOperator.scala:207)shark.execution.FileSinkOperator$$anonfun$executeProcessFileSinkPartition$1.apply(FileSinkOperator.scala:211)shark.execution.FileSinkOperator$$anonfun$executeProcessFileSinkPartition$1.apply(FileSinkOperator.scala:211)org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:107)org.apache.spark.scheduler.Task.run(Task.scala:53)org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:215)org.apache.spa

rk.dep

loy.Sp
>
>     arkHadoopUtil.runAsUser(SparkHadoopUtil.scala:50)org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:182)java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)java.lang.Thread.run(Thread.java:744

>
>     > I should be using Hive 0.9.0, shark 0.8.1, elasticsearch 1.0.0, Hadoop 1.0.4, and java 1.7.0_51
>     > Based on my cursory look at the hadoop and elasticsearch-hadoop sources, it looks like hive is just rethrowing an
>     > IOException it's getting from Spark, and elasticsearch-hadoop is just hitting those exceptions.
>     > I suppose my questions are: Does this look like an issue with my ES/elasticsearch-hadoop config? And has anyone gotten
>     > elasticsearch working with Spark/Shark?
>     > Any ideas/insights are appreciated.
>     > Thanks,Max
>     >
>     > --
>     > You received this message because you are subscribed to the Google Groups "elasticsearch" group.
>     > To unsubscribe from this group and stop receiving emails from it, send an email to
>     >elasticsearc...@googlegroups.com <javascript:>.
>     > To view this discussion on the web visit
>     >https://groups.google.com/d/msgid/elasticsearch/9486faff-3eaf-4344-8931-3121bbc5d9c7%40googlegroups.com
<https://groups.google.com/d/msgid/elasticsearch/9486faff-3eaf-4344-8931-3121bbc5d9c7%40googlegroups.com>
>     <https://groups.google.com/d/msgid/elasticsearch/9486faff-3eaf-4344-8931-3121bbc5d9c7%40googlegroups.com
<https://groups.google.com/d/msgid/elasticsearch/9486faff-3eaf-4344-8931-3121bbc5d9c7%40googlegroups.com>>.
>     > For more options, visithttps://groups.google.com/groups/opt_out <http://groups.google.com/groups/opt_out> <https://groups.google.com/groups/opt_out
<https://groups.google.com/groups/opt_out>>.
>
>     --
>     Costin
>
> --
> You received this message because you are subscribed to the Google Groups "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
>elasticsearc...@googlegroups.com <javascript:>.
> To view this discussion on the web visit
>https://groups.google.com/d/msgid/elasticsearch/86187c3a-0974-4d10-9689-e83da788c04a%40googlegroups.com
<https://groups.google.com/d/msgid/elasticsearch/86187c3a-0974-4d10-9689-e83da788c04a%40googlegroups.com>.
> For more options, visithttps://groups.google.com/groups/opt_out <https://groups.google.com/groups/opt_out>.

--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/e29e342d-de74-4ed6-93d4-875fc728c5a5%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/53087C79.4030109%40gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Nick Pentreath) #7

Hi

I am struggling to get this working too. I'm just trying locally for now,
running Shark 0.8.1, Hive 0.9.0 and ES 1.0.1 with ES-hadoop 1.3.0.M2.

I managed to get a basic example working with WRITING into an index. But
I'm really after READING and index.

I believe I have set everything up correctly, I've added the jar to Shark:
ADD JAR /path/to/es-hadoop.jar;

created a table:
CREATE EXTERNAL TABLE test_read (name string, price double)

STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'

TBLPROPERTIES('es.resource' = 'test_index/test_type/_search?q=*');

And then trying to 'SELECT * FROM test _read' gives me :

org.apache.spark.SparkException: Job aborted: Task 3.0:0 failed more than 0
times; aborting job java.lang.ClassCastException:
org.elasticsearch.hadoop.hive.EsHiveInputFormat$ESHiveSplit cannot be cast
to org.elasticsearch.hadoop.hive.EsHiveInputFormat$ESHiveSplit

at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:827)

at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:825)

at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)

at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)

at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:825)

at
org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:440)

at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$run(DAGScheduler.scala:502)

at
org.apache.spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:157)

FAILED: Execution Error, return code -101 from shark.execution.SparkTask

In fact I get the same error thrown when trying to READ from the table that
I successfully WROTE to...
On Saturday, 22 February 2014 12:31:21 UTC+2, Costin Leau wrote:

Yeah, it might have been some sort of network configuration issue where
services where running on different machines and
localhost pointed to a different location.

Either way, I'm glad to hear things have are moving forward.

Cheers,

On 22/02/2014 1:06 AM, Max Lang wrote:

I managed to get it working on ec2 without issue this time. I'd say the
biggest difference was that this time I set up a
dedicated ES machine. Is it possible that, because I was using a cluster
with slaves, when I used "localhost" the slaves
couldn't find the ES instance running on the master? Or do all the
requests go through the master?

On Wednesday, February 19, 2014 2:35:40 PM UTC-8, Costin Leau wrote:

Hi, 

Setting logging in Hive/Hadoop can be tricky since the log4j needs 

to be picked up by the running JVM otherwise you

won't see anything. 
Take a look at this link on how to tell Hive to use your logging 

settings [1].

For the next release, we might introduce dedicated exceptions for 

the simple fact that some libraries, like Hive,

swallow the stack trace and it's unclear what the issue is which 

makes the exception (IllegalStateException) ambiguous.

Let me know how it goes and whether you will encounter any issues 

with Shark. Or if you don't :slight_smile:

Thanks! 

[1] 

https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-ErrorLogs

<

https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-ErrorLogs>

On 20/02/2014 12:02 AM, Max Lang wrote: 
> Hey Costin, 
> 
> Thanks for the swift reply. I abandoned EC2 to take that out of 

the equation and managed to get everything working

> locally using the latest version of everything (though I realized 

just now I'm still on hive 0.9). I'm guessing you're

> right about some port connection issue because I definitely had ES 

running on that machine.

> 
> I changed hive-log4j.properties and added 
> | 
> #custom logging levels 
> #log4j.logger.xxx=DEBUG 
> log4j.logger.org.elasticsearch.hadoop.rest=TRACE 
>log4j.logger.org.elasticsearch.hadoop.mr <

http://log4j.logger.org.elasticsearch.hadoop.mr>=TRACE

> | 
> 
> But I didn't see any trace logging. Hopefully I can get it working 

on EC2 without issue, but, for the future, is this

> the correct way to set TRACE logging? 
> 
> Oh and, for reference, I tried running without ES up and I got the 

following, exceptions:

> 
> 2014-02-19 13:46:08,803 ERROR shark.SharkDriver 

(Logging.scala:logError(64)) - FAILED: Hive Internal Error:

> java.lang.IllegalStateException(Cannot discover Elasticsearch 

version)

> java.lang.IllegalStateException: Cannot discover Elasticsearch 

version

> at 

org.elasticsearch.hadoop.hive.EsStorageHandler.init(EsStorageHandler.java:101)

> at 

org.elasticsearch.hadoop.hive.EsStorageHandler.configureOutputJobProperties(EsStorageHandler.java:83)

> at 

org.apache.hadoop.hive.ql.plan.PlanUtils.configureJobPropertiesForStorageHandler(PlanUtils.java:706)

> at 

org.apache.hadoop.hive.ql.plan.PlanUtils.configureOutputJobPropertiesForStorageHandler(PlanUtils.java:675)

> at 

org.apache.hadoop.hive.ql.exec.FileSinkOperator.augmentPlan(FileSinkOperator.java:764)

> at 

org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.putOpInsertMap(SemanticAnalyzer.java:1518)

> at 

org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:4337)

> at 

org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:6207)

> at 

org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:6138)

> at 

org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:6764)

> at 

shark.parse.SharkSemanticAnalyzer.analyzeInternal(SharkSemanticAnalyzer.scala:149)

> at 

org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:244)

> at shark.SharkDriver.compile(SharkDriver.scala:215) 
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:336) 
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:895) 
> at shark.SharkCliDriver.processCmd(SharkCliDriver.scala:324) 
> at 

org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406)

> at shark.SharkCliDriver$.main(SharkCliDriver.scala:232) 
> at shark.SharkCliDriver.main(SharkCliDriver.scala) 
> Caused by: java.io.IOException: Out of nodes and retries; caught 

exception

> at 

org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:81)

> at 

org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:221)

> at 

org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:205)

> at 

org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:209)

> at 

org.elasticsearch.hadoop.rest.RestClient.get(RestClient.java:103)

> at 

org.elasticsearch.hadoop.rest.RestClient.esVersion(RestClient.java:274)

> at 

org.elasticsearch.hadoop.rest.InitializationUtils.discoverEsVersion(InitializationUtils.java:84)

> at 

org.elasticsearch.hadoop.hive.EsStorageHandler.init(EsStorageHandler.java:99)

> ... 18 more 
> Caused by: java.net.ConnectException: Connection refused 
> at java.net.PlainSocketImpl.socketConnect(Native Method) 
> at 

java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)

> at 

java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)

> at 

java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)

> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:391) 
> at java.net.Socket.connect(Socket.java:579) 
> at java.net.Socket.connect(Socket.java:528) 
> at java.net.Socket.<init>(Socket.java:425) 
> at java.net.Socket.<init>(Socket.java:280) 
> at 

org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:80)

> at 

org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:122)

> at 

org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:707)

> at 

org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:387)

> at 

org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)

> at 

org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)

> at 

org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)

> at 

org.elasticsearch.hadoop.rest.commonshttp.CommonsHttpTransport.execute(CommonsHttpTransport.java:160)

> at 

org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:74)

> ... 25 more 
> 
> Let me know if there's anything in particular you'd like me to try 

on EC2.

> 
> (For posterity, the versions I used were: hadoop 2.2.0, hive 

0.9.0, shark 8.1, spark 8.1, es-hadoop 1.3.0.M2, java

> 1.7.0_15, scala 2.9.3, elasticsearch 1.0.0) 
> 
> Thanks again, 
> Max 
> 
> On Tuesday, February 18, 2014 10:16:38 PM UTC-8, Costin Leau 

wrote:

> 
>     The error indicates a network error - namely es-hadoop cannot 

connect to Elasticsearch on the default (localhost:9200)

>     HTTP port. Can you double check whether that's indeed the case 

(using curl or even telnet on that port) - maybe the

>     firewall prevents any connections to be made... 
>     Also you could try using the latest Hive, 0.12 and a more 

recent Hadoop such as 1.1.2 or 1.2.1.

> 
>     Additionally, can you enable TRACE logging in your job on 

es-hadoop packages org.elasticsearch.hadoop.rest and

>org.elasticsearch.hadoop.mr <http://org.elasticsearch.hadoop.mr> <

http://org.elasticsearch.hadoop.mr

<http://org.elasticsearch.hadoop.mr>> packages and report back ? 
> 
>     Thanks, 
> 
>     On 19/02/2014 4:03 AM, Max Lang wrote: 
>     > I set everything up using this guide:

https://github.com/amplab/shark/wiki/Running-Shark-on-EC2 <
https://github.com/amplab/shark/wiki/Running-Shark-on-EC2>

>     <https://github.com/amplab/shark/wiki/Running-Shark-on-EC2 
<https://github.com/amplab/shark/wiki/Running-Shark-on-EC2>> on an 

ec2 cluster. I've

>     > copied the elasticsearch-hadoop jars into the hive lib 

directory and I have elasticsearch running on localhost:9200. I'm

>     > running shark in a screen session with --service 

screenserver and connecting to it at the same time using shark -h

>     > localhost. 
>     > 
>     > Unfortunately, when I attempt to write data into 

elasticsearch, it fails. Here's an example:

>     > 
>     > | 
>     > [localhost:10000]shark>CREATE EXTERNAL TABLE wiki (id 

BIGINT,title STRING,last_modified STRING,xml STRING,text

>     > STRING)ROW FORMAT DELIMITED FIELDS TERMINATED BY 

'\t'LOCATION 's3n://spark-data/wikipedia-sample/';

>     > Timetaken (including network latency):0.159seconds 
>     > 14/02/1901:23:33INFO CliDriver:Timetaken (including network 

latency):0.159seconds

>     > 
>     > [localhost:10000]shark>SELECT title FROM wiki LIMIT 1; 
>     > Alpokalja 
>     > Timetaken (including network latency):2.23seconds 
>     > 14/02/1901:23:48INFO CliDriver:Timetaken (including network 

latency):2.23seconds

>     > 
>     > [localhost:10000]shark>CREATE EXTERNAL TABLE es_wiki (id 

BIGINT,title STRING,last_modified STRING,xml STRING,text

>     > STRING)STORED BY 

'org.elasticsearch.hadoop.hive.EsStorageHandler'TBLPROPERTIES('es.resource'='wikipedia/article');

>     > Timetaken (including network latency):0.061seconds 
>     > 14/02/1901:33:51INFO CliDriver:Timetaken (including network 

latency):0.061seconds

>     > 
>     > [localhost:10000]shark>INSERT OVERWRITE TABLE es_wiki 

SELECTw.id http://w.id,w.title,w.last_modified,w.xml,w.text FROM wiki
w;

>     > [HiveError]:Queryreturned non-zero 

code:9,cause:FAILED:ExecutionError,returncode
-101fromshark.execution.SparkTask

>     > Timetaken (including network latency):3.575seconds 
>     > 14/02/1901:34:42INFO CliDriver:Timetaken (including network 

latency):3.575seconds

>     > | 
>     > 
>     > *The stack trace looks like this:* 
>     > 
>     > org.apache.hadoop.hive.ql.metadata.HiveException 

(org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException:

>     > Out of nodes and retries; caught exception) 
>     > 
>     > 

org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:602)shark.execution.FileSinkOperator$$anonfun$processPartition$1.apply(FileSinkOperator.scala:84)shark.execution.FileSinkOperator$$anonfun$processPartition$1.apply(FileSinkOperator.scala:81)scala.collection.Iterator$class.foreach(Iterator.scala:772)scala.collection.Iterator$$anon$19.foreach(Iterator.scala:399)shark.execution.FileSinkOperator.processPartition(FileSinkOperator.scala:81)shark.execution.FileSinkOperator$.writeFiles$1(FileSinkOperator.scala:207)shark.execution.FileSinkOperator$$anonfun$executeProcessFileSinkPartition$1.apply(FileSinkOperator.scala:211)shark.execution.FileSinkOperator$$anonfun$executeProcessFileSinkPartition$1.apply(FileSinkOperator.scala:211)org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:107)org.apache.spark.scheduler.Task.run(Task.scala:53)org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:215)org.apache.spa

rk.dep

loy.Sp 
> 
>     

arkHadoopUtil.runAsUser(SparkHadoopUtil.scala:50)org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:182)java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)java.lang.Thread.run(Thread.java:744

> 
>     > I should be using Hive 0.9.0, shark 0.8.1, elasticsearch 

1.0.0, Hadoop 1.0.4, and java 1.7.0_51

>     > Based on my cursory look at the hadoop and 

elasticsearch-hadoop sources, it looks like hive is just rethrowing an

>     > IOException it's getting from Spark, and 

elasticsearch-hadoop is just hitting those exceptions.

>     > I suppose my questions are: Does this look like an issue 

with my ES/elasticsearch-hadoop config? And has anyone gotten

>     > elasticsearch working with Spark/Shark? 
>     > Any ideas/insights are appreciated. 
>     > Thanks,Max 
>     > 
>     > -- 
>     > You received this message because you are subscribed to the 

Google Groups "elasticsearch" group.

>     > To unsubscribe from this group and stop receiving emails 

from it, send an email to

>     >elasticsearc...@googlegroups.com <javascript:>. 
>     > To view this discussion on the web visit 
>     >

https://groups.google.com/d/msgid/elasticsearch/9486faff-3eaf-4344-8931-3121bbc5d9c7%40googlegroups.com

<

https://groups.google.com/d/msgid/elasticsearch/9486faff-3eaf-4344-8931-3121bbc5d9c7%40googlegroups.com>

>     <

https://groups.google.com/d/msgid/elasticsearch/9486faff-3eaf-4344-8931-3121bbc5d9c7%40googlegroups.com

<

https://groups.google.com/d/msgid/elasticsearch/9486faff-3eaf-4344-8931-3121bbc5d9c7%40googlegroups.com>>.

>     > For more options, visithttps://

groups.google.com/groups/opt_out http://groups.google.com/groups/opt_out
<https://groups.google.com/groups/opt_out

<https://groups.google.com/groups/opt_out>>. 
> 
>     -- 
>     Costin 
> 
> -- 
> You received this message because you are subscribed to the Google 

Groups "elasticsearch" group.

> To unsubscribe from this group and stop receiving emails from it, 

send an email to

>elasticsearc...@googlegroups.com <javascript:>. 
> To view this discussion on the web visit 
>

https://groups.google.com/d/msgid/elasticsearch/86187c3a-0974-4d10-9689-e83da788c04a%40googlegroups.com

<

https://groups.google.com/d/msgid/elasticsearch/86187c3a-0974-4d10-9689-e83da788c04a%40googlegroups.com>.

> For more options, visithttps://groups.google.com/groups/opt_out <

https://groups.google.com/groups/opt_out>.

-- 
Costin 

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to
elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit

https://groups.google.com/d/msgid/elasticsearch/e29e342d-de74-4ed6-93d4-875fc728c5a5%40googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c1081bf2-117a-4af2-ba90-2c38a4572782%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Costin Leau) #8

I recommend using master - there are several improvements done in this area. Also using the latest Shark (0.9.0) and
Hive (0.12) will help.

On 3/20/14 12:00 PM, Nick Pentreath wrote:

Hi

I am struggling to get this working too. I'm just trying locally for now, running Shark 0.8.1, Hive 0.9.0 and ES 1.0.1
with ES-hadoop 1.3.0.M2.

I managed to get a basic example working with WRITING into an index. But I'm really after READING and index.

I believe I have set everything up correctly, I've added the jar to Shark:
ADD JAR /path/to/es-hadoop.jar;

created a table:
CREATE EXTERNAL TABLE test_read (name string, price double)

STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'

TBLPROPERTIES('es.resource' = 'test_index/test_type/_search?q=*');

And then trying to 'SELECT * FROM test _read' gives me :

org.apache.spark.SparkException: Job aborted: Task 3.0:0 failed more than 0 times; aborting job
java.lang.ClassCastException: org.elasticsearch.hadoop.hive.EsHiveInputFormat$ESHiveSplit cannot be cast to
org.elasticsearch.hadoop.hive.EsHiveInputFormat$ESHiveSplit

at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:827)

at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:825)

at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)

at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)

at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:825)

at org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:440)

at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$run(DAGScheduler.scala:502)

at org.apache.spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:157)

FAILED: Execution Error, return code -101 from shark.execution.SparkTask

In fact I get the same error thrown when trying to READ from the table that I successfully WROTE to...

On Saturday, 22 February 2014 12:31:21 UTC+2, Costin Leau wrote:

Yeah, it might have been some sort of network configuration issue where services where running on different machines
and
localhost pointed to a different location.

Either way, I'm glad to hear things have are moving forward.

Cheers,

On 22/02/2014 1:06 AM, Max Lang wrote:
> I managed to get it working on ec2 without issue this time. I'd say the biggest difference was that this time I set up a
> dedicated ES machine. Is it possible that, because I was using a cluster with slaves, when I used "localhost" the slaves
> couldn't find the ES instance running on the master? Or do all the requests go through the master?
>
>
> On Wednesday, February 19, 2014 2:35:40 PM UTC-8, Costin Leau wrote:
>
>     Hi,
>
>     Setting logging in Hive/Hadoop can be tricky since the log4j needs to be picked up by the running JVM otherwise you
>     won't see anything.
>     Take a look at this link on how to tell Hive to use your logging settings [1].
>
>     For the next release, we might introduce dedicated exceptions for the simple fact that some libraries, like Hive,
>     swallow the stack trace and it's unclear what the issue is which makes the exception (IllegalStateException) ambiguous.
>
>     Let me know how it goes and whether you will encounter any issues with Shark. Or if you don't :)
>
>     Thanks!
>
>     [1]https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-ErrorLogs
<https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-ErrorLogs>
>     <https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-ErrorLogs
<https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-ErrorLogs>>
>
>     On 20/02/2014 12:02 AM, Max Lang wrote:
>     > Hey Costin,
>     >
>     > Thanks for the swift reply. I abandoned EC2 to take that out of the equation and managed to get everything working
>     > locally using the latest version of everything (though I realized just now I'm still on hive 0.9). I'm guessing you're
>     > right about some port connection issue because I definitely had ES running on that machine.
>     >
>     > I changed hive-log4j.properties and added
>     > |
>     > #custom logging levels
>     > #log4j.logger.xxx=DEBUG
>     > log4j.logger.org.elasticsearch.hadoop.rest=TRACE
>     >log4j.logger.org.elasticsearch.hadoop.mr <http://log4j.logger.org.elasticsearch.hadoop.mr>
<http://log4j.logger.org.elasticsearch.hadoop.mr <http://log4j.logger.org.elasticsearch.hadoop.mr>>=TRACE
>     > |
>     >
>     > But I didn't see any trace logging. Hopefully I can get it working on EC2 without issue, but, for the future, is this
>     > the correct way to set TRACE logging?
>     >
>     > Oh and, for reference, I tried running without ES up and I got the following, exceptions:
>     >
>     > 2014-02-19 13:46:08,803 ERROR shark.SharkDriver (Logging.scala:logError(64)) - FAILED: Hive Internal Error:
>     > java.lang.IllegalStateException(Cannot discover Elasticsearch version)
>     > java.lang.IllegalStateException: Cannot discover Elasticsearch version
>     > at org.elasticsearch.hadoop.hive.EsStorageHandler.init(EsStorageHandler.java:101)
>     > at org.elasticsearch.hadoop.hive.EsStorageHandler.configureOutputJobProperties(EsStorageHandler.java:83)
>     > at org.apache.hadoop.hive.ql.plan.PlanUtils.configureJobPropertiesForStorageHandler(PlanUtils.java:706)
>     > at org.apache.hadoop.hive.ql.plan.PlanUtils.configureOutputJobPropertiesForStorageHandler(PlanUtils.java:675)
>     > at org.apache.hadoop.hive.ql.exec.FileSinkOperator.augmentPlan(FileSinkOperator.java:764)
>     > at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.putOpInsertMap(SemanticAnalyzer.java:1518)
>     > at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:4337)
>     > at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:6207)
>     > at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:6138)
>     > at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:6764)
>     > at shark.parse.SharkSemanticAnalyzer.analyzeInternal(SharkSemanticAnalyzer.scala:149)
>     > at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:244)
>     > at shark.SharkDriver.compile(SharkDriver.scala:215)
>     > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:336)
>     > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:895)
>     > at shark.SharkCliDriver.processCmd(SharkCliDriver.scala:324)
>     > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406)
>     > at shark.SharkCliDriver$.main(SharkCliDriver.scala:232)
>     > at shark.SharkCliDriver.main(SharkCliDriver.scala)
>     > Caused by: java.io.IOException: Out of nodes and retries; caught exception
>     > at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:81)
>     > at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:221)
>     > at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:205)
>     > at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:209)
>     > at org.elasticsearch.hadoop.rest.RestClient.get(RestClient.java:103)
>     > at org.elasticsearch.hadoop.rest.RestClient.esVersion(RestClient.java:274)
>     > at org.elasticsearch.hadoop.rest.InitializationUtils.discoverEsVersion(InitializationUtils.java:84)
>     > at org.elasticsearch.hadoop.hive.EsStorageHandler.init(EsStorageHandler.java:99)
>     > ... 18 more
>     > Caused by: java.net.ConnectException: Connection refused
>     > at java.net.PlainSocketImpl.socketConnect(Native Method)
>     > at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
>     > at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
>     > at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
>     > at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:391)
>     > at java.net.Socket.connect(Socket.java:579)
>     > at java.net.Socket.connect(Socket.java:528)
>     > at java.net.Socket.<init>(Socket.java:425)
>     > at java.net.Socket.<init>(Socket.java:280)
>     > at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:80)
>     > at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:122)
>     > at org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:707)
>     > at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:387)
>     > at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
>     > at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
>     > at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
>     > at org.elasticsearch.hadoop.rest.commonshttp.CommonsHttpTransport.execute(CommonsHttpTransport.java:160)
>     > at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:74)
>     > ... 25 more
>     >
>     > Let me know if there's anything in particular you'd like me to try on EC2.
>     >
>     > (For posterity, the versions I used were: hadoop 2.2.0, hive 0.9.0, shark 8.1, spark 8.1, es-hadoop 1.3.0.M2, java
>     > 1.7.0_15, scala 2.9.3, elasticsearch 1.0.0)
>     >
>     > Thanks again,
>     > Max
>     >
>     > On Tuesday, February 18, 2014 10:16:38 PM UTC-8, Costin Leau wrote:
>     >
>     >     The error indicates a network error - namely es-hadoop cannot connect to Elasticsearch on the default (localhost:9200)
>     >     HTTP port. Can you double check whether that's indeed the case (using curl or even telnet on that port) - maybe the
>     >     firewall prevents any connections to be made...
>     >     Also you could try using the latest Hive, 0.12 and a more recent Hadoop such as 1.1.2 or 1.2.1.
>     >
>     >     Additionally, can you enable TRACE logging in your job on es-hadoop packages org.elasticsearch.hadoop.rest and
>     >org.elasticsearch.hadoop.mr <http://org.elasticsearch.hadoop.mr> <http://org.elasticsearch.hadoop.mr
<http://org.elasticsearch.hadoop.mr>> <http://org.elasticsearch.hadoop.mr <http://org.elasticsearch.hadoop.mr>
>     <http://org.elasticsearch.hadoop.mr <http://org.elasticsearch.hadoop.mr>>> packages and report back ?
>     >
>     >     Thanks,
>     >
>     >     On 19/02/2014 4:03 AM, Max Lang wrote:
>     >     > I set everything up using this guide:https://github.com/amplab/shark/wiki/Running-Shark-on-EC2
<https://github.com/amplab/shark/wiki/Running-Shark-on-EC2>
<https://github.com/amplab/shark/wiki/Running-Shark-on-EC2 <https://github.com/amplab/shark/wiki/Running-Shark-on-EC2>>
>     >     <https://github.com/amplab/shark/wiki/Running-Shark-on-EC2 <https://github.com/amplab/shark/wiki/Running-Shark-on-EC2>
>     <https://github.com/amplab/shark/wiki/Running-Shark-on-EC2
<https://github.com/amplab/shark/wiki/Running-Shark-on-EC2>>> on an ec2 cluster. I've
>     >     > copied the elasticsearch-hadoop jars into the hive lib directory and I have elasticsearch running on localhost:9200. I'm
>     >     > running shark in a screen session with --service screenserver and connecting to it at the same time using shark -h
>     >     > localhost.
>     >     >
>     >     > Unfortunately, when I attempt to write data into elasticsearch, it fails. Here's an example:
>     >     >
>     >     > |
>     >     > [localhost:10000]shark>CREATE EXTERNAL TABLE wiki (id BIGINT,title STRING,last_modified STRING,xml STRING,text
>     >     > STRING)ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'LOCATION 's3n://spark-data/wikipedia-sample/';
>     >     > Timetaken (including network latency):0.159seconds
>     >     > 14/02/1901:23:33INFO CliDriver:Timetaken (including network latency):0.159seconds
>     >     >
>     >     > [localhost:10000]shark>SELECT title FROM wiki LIMIT 1;
>     >     > Alpokalja
>     >     > Timetaken (including network latency):2.23seconds
>     >     > 14/02/1901:23:48INFO CliDriver:Timetaken (including network latency):2.23seconds
>     >     >
>     >     > [localhost:10000]shark>CREATE EXTERNAL TABLE es_wiki (id BIGINT,title STRING,last_modified STRING,xml STRING,text
>     >     > STRING)STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'TBLPROPERTIES('es.resource'='wikipedia/article');
>     >     > Timetaken (including network latency):0.061seconds
>     >     > 14/02/1901:33:51INFO CliDriver:Timetaken (including network latency):0.061seconds
>     >     >
>     >     > [localhost:10000]shark>INSERT OVERWRITE TABLE es_wiki SELECTw.id <http://w.id>,w.title,w.last_modified,w.xml,w.text FROM wiki w;
>     >     > [HiveError]:Queryreturned non-zero code:9,cause:FAILED:ExecutionError,returncode -101fromshark.execution.SparkTask
>     >     > Timetaken (including network latency):3.575seconds
>     >     > 14/02/1901:34:42INFO CliDriver:Timetaken (including network latency):3.575seconds
>     >     > |
>     >     >
>     >     > *The stack trace looks like this:*
>     >     >
>     >     > org.apache.hadoop.hive.ql.metadata.HiveException (org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException:
>     >     > Out of nodes and retries; caught exception)
>     >     >
>     >     > org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:602)shark.execution.FileSinkOperator$$anonfun$processPartition$1.apply(FileSinkOperator.scala:84)shark.execution.FileSinkOperator$$anonfun$processPartition$1.apply(FileSinkOperator.scala:81)scala.collection.Iterator$class.foreach(Iterator.scala:772)scala.collection.Iterator$$anon$19.foreach(Iterator.scala:399)shark.execution.FileSinkOperator.processPartition(FileSinkOperator.scala:81)shark.execution.FileSinkOperator$.writeFiles$1(FileSinkOperator.scala:207)shark.execution.FileSinkOperator$$anonfun$executeProcessFileSinkPartition$1.apply(FileSinkOperator.scala:211)shark.execution.FileSinkOperator$$anonfun$executeProcessFileSinkPartition$1.apply(FileSinkOperator.scala:211)org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:107)org.apache.spark.scheduler.Task.run(Task.scala:53)org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:215)org.apac

he.spa

rk.dep
>
>     loy.Sp
>     >
>     >     arkHadoopUtil.runAsUser(SparkHadoopUtil.scala:50)org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:182)java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)java.lang.Thread.run(Thread.java:744

>
>     >
>     >     > I should be using Hive 0.9.0, shark 0.8.1, elasticsearch 1.0.0, Hadoop 1.0.4, and java 1.7.0_51
>     >     > Based on my cursory look at the hadoop and elasticsearch-hadoop sources, it looks like hive is just rethrowing an
>     >     > IOException it's getting from Spark, and elasticsearch-hadoop is just hitting those exceptions.
>     >     > I suppose my questions are: Does this look like an issue with my ES/elasticsearch-hadoop config? And has anyone gotten
>     >     > elasticsearch working with Spark/Shark?
>     >     > Any ideas/insights are appreciated.
>     >     > Thanks,Max
>     >     >
>     >     > --
>     >     > You received this message because you are subscribed to the Google Groups "elasticsearch" group.
>     >     > To unsubscribe from this group and stop receiving emails from it, send an email to
>     >     >elasticsearc...@googlegroups.com <javascript:>.
>     >     > To view this discussion on the web visit
>     >     >https://groups.google.com/d/msgid/elasticsearch/9486faff-3eaf-4344-8931-3121bbc5d9c7%40googlegroups.com
<https://groups.google.com/d/msgid/elasticsearch/9486faff-3eaf-4344-8931-3121bbc5d9c7%40googlegroups.com>
>     <https://groups.google.com/d/msgid/elasticsearch/9486faff-3eaf-4344-8931-3121bbc5d9c7%40googlegroups.com
<https://groups.google.com/d/msgid/elasticsearch/9486faff-3eaf-4344-8931-3121bbc5d9c7%40googlegroups.com>>
>     >     <https://groups.google.com/d/msgid/elasticsearch/9486faff-3eaf-4344-8931-3121bbc5d9c7%40googlegroups.com
<https://groups.google.com/d/msgid/elasticsearch/9486faff-3eaf-4344-8931-3121bbc5d9c7%40googlegroups.com>
>     <https://groups.google.com/d/msgid/elasticsearch/9486faff-3eaf-4344-8931-3121bbc5d9c7%40googlegroups.com
<https://groups.google.com/d/msgid/elasticsearch/9486faff-3eaf-4344-8931-3121bbc5d9c7%40googlegroups.com>>>.
>     >     > For more options, visithttps://groups.google.com/groups/opt_out <http://groups.google.com/groups/opt_out> <http://groups.google.com/groups/opt_out
<http://groups.google.com/groups/opt_out>> <https://groups.google.com/groups/opt_out
<https://groups.google.com/groups/opt_out>
>     <https://groups.google.com/groups/opt_out <https://groups.google.com/groups/opt_out>>>.
>     >
>     >     --
>     >     Costin
>     >
>     > --
>     > You received this message because you are subscribed to the Google Groups "elasticsearch" group.
>     > To unsubscribe from this group and stop receiving emails from it, send an email to
>     >elasticsearc...@googlegroups.com <javascript:>.
>     > To view this discussion on the web visit
>     >https://groups.google.com/d/msgid/elasticsearch/86187c3a-0974-4d10-9689-e83da788c04a%40googlegroups.com
<https://groups.google.com/d/msgid/elasticsearch/86187c3a-0974-4d10-9689-e83da788c04a%40googlegroups.com>
>     <https://groups.google.com/d/msgid/elasticsearch/86187c3a-0974-4d10-9689-e83da788c04a%40googlegroups.com
<https://groups.google.com/d/msgid/elasticsearch/86187c3a-0974-4d10-9689-e83da788c04a%40googlegroups.com>>.
>     > For more options, visithttps://groups.google.com/groups/opt_out <http://groups.google.com/groups/opt_out> <https://groups.google.com/groups/opt_out
<https://groups.google.com/groups/opt_out>>.
>
>     --
>     Costin
>
> --
> You received this message because you are subscribed to the Google Groups "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
>elasticsearc...@googlegroups.com <javascript:>.
> To view this discussion on the web visit
>https://groups.google.com/d/msgid/elasticsearch/e29e342d-de74-4ed6-93d4-875fc728c5a5%40googlegroups.com
<https://groups.google.com/d/msgid/elasticsearch/e29e342d-de74-4ed6-93d4-875fc728c5a5%40googlegroups.com>.
> For more options, visithttps://groups.google.com/groups/opt_out <https://groups.google.com/groups/opt_out>.

--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
elasticsearch+unsubscribe@googlegroups.com mailto:elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/c1081bf2-117a-4af2-ba90-2c38a4572782%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/c1081bf2-117a-4af2-ba90-2c38a4572782%40googlegroups.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout.

--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/532AE2B5.8080004%40gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Nick Pentreath) #9

Thanks for the response.

I tried latest Shark (cdh4 version of 0.9.1 here
http://cloudera.rst.im/shark/ ) - this uses hadoop 1.0.4 and hive 0.11 I
believe, and build elasticsearch-hadoop from github master.

Still getting same error:
org.elasticsearch.hadoop.hive.EsHiveInputFormat$EsHiveSplit cannot be cast
to org.elasticsearch.hadoop.hive.EsHiveInputFormat$EsHiveSplit

Will using hive 0.11 / hadoop 1.0.4 vs hive 0.12 / hadoop 1.2.1 in
es-hadoop master make a difference?

Anyone else actually got this working?

On Thu, Mar 20, 2014 at 2:44 PM, Costin Leau costin.leau@gmail.com wrote:

I recommend using master - there are several improvements done in this
area. Also using the latest Shark (0.9.0) and Hive (0.12) will help.

On 3/20/14 12:00 PM, Nick Pentreath wrote:

Hi

I am struggling to get this working too. I'm just trying locally for now,
running Shark 0.8.1, Hive 0.9.0 and ES 1.0.1
with ES-hadoop 1.3.0.M2.

I managed to get a basic example working with WRITING into an index. But
I'm really after READING and index.

I believe I have set everything up correctly, I've added the jar to Shark:
ADD JAR /path/to/es-hadoop.jar;

created a table:
CREATE EXTERNAL TABLE test_read (name string, price double)

STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'

TBLPROPERTIES('es.resource' = 'test_index/test_type/_search?q=*');

And then trying to 'SELECT * FROM test _read' gives me :

org.apache.spark.SparkException: Job aborted: Task 3.0:0 failed more
than 0 times; aborting job
java.lang.ClassCastException: org.elasticsearch.hadoop.hive.EsHiveInputFormat$ESHiveSplit
cannot be cast to
org.elasticsearch.hadoop.hive.EsHiveInputFormat$ESHiveSplit

at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(
DAGScheduler.scala:827)

at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(
DAGScheduler.scala:825)

at scala.collection.mutable.ResizableArray$class.foreach(
ResizableArray.scala:60)

at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)

at org.apache.spark.scheduler.DAGScheduler.abortStage(
DAGScheduler.scala:825)

at org.apache.spark.scheduler.DAGScheduler.processEvent(
DAGScheduler.scala:440)

at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$
scheduler$DAGScheduler$$run(DAGScheduler.scala:502)

at org.apache.spark.scheduler.DAGScheduler$$anon$1.run(
DAGScheduler.scala:157)

FAILED: Execution Error, return code -101 from shark.execution.SparkTask

In fact I get the same error thrown when trying to READ from the table
that I successfully WROTE to...

On Saturday, 22 February 2014 12:31:21 UTC+2, Costin Leau wrote:

Yeah, it might have been some sort of network configuration issue

where services where running on different machines
and
localhost pointed to a different location.

Either way, I'm glad to hear things have are moving forward.

Cheers,

On 22/02/2014 1:06 AM, Max Lang wrote:
> I managed to get it working on ec2 without issue this time. I'd say

the biggest difference was that this time I set up a
> dedicated ES machine. Is it possible that, because I was using a
cluster with slaves, when I used "localhost" the slaves
> couldn't find the ES instance running on the master? Or do all the
requests go through the master?
>
>
> On Wednesday, February 19, 2014 2:35:40 PM UTC-8, Costin Leau wrote:
>
> Hi,
>
> Setting logging in Hive/Hadoop can be tricky since the log4j
needs to be picked up by the running JVM otherwise you
> won't see anything.
> Take a look at this link on how to tell Hive to use your
logging settings [1].
>
> For the next release, we might introduce dedicated exceptions
for the simple fact that some libraries, like Hive,
> swallow the stack trace and it's unclear what the issue is
which makes the exception (IllegalStateException) ambiguous.
>
> Let me know how it goes and whether you will encounter any
issues with Shark. Or if you don't :slight_smile:
>
> Thanks!
>
> [1]https://cwiki.apache.org/confluence/display/Hive/
GettingStarted#GettingStarted-ErrorLogs
<https://cwiki.apache.org/confluence/display/Hive/
GettingStarted#GettingStarted-ErrorLogs>
> <https://cwiki.apache.org/confluence/display/Hive/
GettingStarted#GettingStarted-ErrorLogs
<https://cwiki.apache.org/confluence/display/Hive/
GettingStarted#GettingStarted-ErrorLogs>>
>
> On 20/02/2014 12:02 AM, Max Lang wrote:
> > Hey Costin,
> >
> > Thanks for the swift reply. I abandoned EC2 to take that out
of the equation and managed to get everything working
> > locally using the latest version of everything (though I
realized just now I'm still on hive 0.9). I'm guessing you're
> > right about some port connection issue because I definitely
had ES running on that machine.
> >
> > I changed hive-log4j.properties and added
> > |
> > #custom logging levels
> > #log4j.logger.xxx=DEBUG
> > log4j.logger.org.elasticsearch.hadoop.rest=TRACE
> >log4j.logger.org.elasticsearch.hadoop.mr <
http://log4j.logger.org.elasticsearch.hadoop.mr>
<http://log4j.logger.org.elasticsearch.hadoop.mr <
http://log4j.logger.org.elasticsearch.hadoop.mr>>=TRACE

>     > |
>     >
>     > But I didn't see any trace logging. Hopefully I can get it

working on EC2 without issue, but, for the future, is this
> > the correct way to set TRACE logging?
> >
> > Oh and, for reference, I tried running without ES up and I
got the following, exceptions:
> >
> > 2014-02-19 13:46:08,803 ERROR shark.SharkDriver
(Logging.scala:logError(64)) - FAILED: Hive Internal Error:
> > java.lang.IllegalStateException(Cannot discover
Elasticsearch version)
> > java.lang.IllegalStateException: Cannot discover
Elasticsearch version
> > at org.elasticsearch.hadoop.hive.EsStorageHandler.init(
EsStorageHandler.java:101)
> > at org.elasticsearch.hadoop.hive.EsStorageHandler.
configureOutputJobProperties(EsStorageHandler.java:83)
> > at org.apache.hadoop.hive.ql.plan.PlanUtils.
configureJobPropertiesForStorageHandler(PlanUtils.java:706)
> > at org.apache.hadoop.hive.ql.plan.PlanUtils.
configureOutputJobPropertiesForStorageHandler(PlanUtils.java:675)
> > at org.apache.hadoop.hive.ql.exec.FileSinkOperator.
augmentPlan(FileSinkOperator.java:764)
> > at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.
putOpInsertMap(SemanticAnalyzer.java:1518)
> > at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.
genFileSinkPlan(SemanticAnalyzer.java:4337)
> > at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.
genPostGroupByBodyPlan(SemanticAnalyzer.java:6207)
> > at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.
genBodyPlan(SemanticAnalyzer.java:6138)
> > at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.
genPlan(SemanticAnalyzer.java:6764)
> > at shark.parse.SharkSemanticAnalyzer.analyzeInternal(
SharkSemanticAnalyzer.scala:149)
> > at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.
analyze(BaseSemanticAnalyzer.java:244)
> > at shark.SharkDriver.compile(SharkDriver.scala:215)
> > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:336)
> > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:895)
> > at shark.SharkCliDriver.processCmd(SharkCliDriver.scala:324)
> > at org.apache.hadoop.hive.cli.CliDriver.processLine(
CliDriver.java:406)
> > at shark.SharkCliDriver$.main(SharkCliDriver.scala:232)
> > at shark.SharkCliDriver.main(SharkCliDriver.scala)
> > Caused by: java.io.IOException: Out of nodes and retries;
caught exception
> > at org.elasticsearch.hadoop.rest.NetworkClient.execute(
NetworkClient.java:81)
> > at org.elasticsearch.hadoop.rest.
RestClient.execute(RestClient.java:221)
> > at org.elasticsearch.hadoop.rest.
RestClient.execute(RestClient.java:205)
> > at org.elasticsearch.hadoop.rest.
RestClient.execute(RestClient.java:209)
> > at org.elasticsearch.hadoop.rest.RestClient.get(RestClient.
java:103)
> > at org.elasticsearch.hadoop.rest.RestClient.esVersion(
RestClient.java:274)
> > at org.elasticsearch.hadoop.rest.InitializationUtils.
discoverEsVersion(InitializationUtils.java:84)
> > at org.elasticsearch.hadoop.hive.EsStorageHandler.init(
EsStorageHandler.java:99)
> > ... 18 more
> > Caused by: java.net.ConnectException: Connection refused
> > at java.net.PlainSocketImpl.socketConnect(Native Method)
> > at java.net.AbstractPlainSocketImpl.doConnect(
AbstractPlainSocketImpl.java:339)
> > at java.net.AbstractPlainSocketImpl.connectToAddress(
AbstractPlainSocketImpl.java:200)
> > at java.net.AbstractPlainSocketImpl.connect(
AbstractPlainSocketImpl.java:182)
> > at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:391)
> > at java.net.Socket.connect(Socket.java:579)
> > at java.net.Socket.connect(Socket.java:528)
> > at java.net.Socket.(Socket.java:425)
> > at java.net.Socket.(Socket.java:280)
> > at org.apache.commons.httpclient.protocol.
DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.
java:80)
> > at org.apache.commons.httpclient.protocol.
DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.
java:122)
> > at org.apache.commons.httpclient.HttpConnection.open(
HttpConnection.java:707)
> > at org.apache.commons.httpclient.HttpMethodDirector.
executeWithRetry(HttpMethodDirector.java:387)
> > at org.apache.commons.httpclient.HttpMethodDirector.
executeMethod(HttpMethodDirector.java:171)
> > at org.apache.commons.httpclient.HttpClient.executeMethod(
HttpClient.java:397)
> > at org.apache.commons.httpclient.HttpClient.executeMethod(
HttpClient.java:323)
> > at org.elasticsearch.hadoop.rest.commonshttp.
CommonsHttpTransport.execute(CommonsHttpTransport.java:160)
> > at org.elasticsearch.hadoop.rest.NetworkClient.execute(
NetworkClient.java:74)
> > ... 25 more
> >
> > Let me know if there's anything in particular you'd like me
to try on EC2.
> >
> > (For posterity, the versions I used were: hadoop 2.2.0, hive
0.9.0, shark 8.1, spark 8.1, es-hadoop 1.3.0.M2, java
> > 1.7.0_15, scala 2.9.3, elasticsearch 1.0.0)
> >
> > Thanks again,
> > Max
> >
> > On Tuesday, February 18, 2014 10:16:38 PM UTC-8, Costin Leau
wrote:
> >
> > The error indicates a network error - namely es-hadoop
cannot connect to Elasticsearch on the default (localhost:9200)
> > HTTP port. Can you double check whether that's indeed the
case (using curl or even telnet on that port) - maybe the
> > firewall prevents any connections to be made...
> > Also you could try using the latest Hive, 0.12 and a more
recent Hadoop such as 1.1.2 or 1.2.1.
> >
> > Additionally, can you enable TRACE logging in your job on
es-hadoop packages org.elasticsearch.hadoop.rest and
> >org.elasticsearch.hadoop.mr <http://org.elasticsearch.
hadoop.mr> <http://org.elasticsearch.hadoop.mr
http://org.elasticsearch.hadoop.mr> <http://org.elasticsearch.
hadoop.mr http://org.elasticsearch.hadoop.mr

>     <http://org.elasticsearch.hadoop.mr <http://org.elasticsearch.

hadoop.mr>>> packages and report back ?
> >
> > Thanks,
> >
> > On 19/02/2014 4:03 AM, Max Lang wrote:
> > > I set everything up using this guide:
https://github.com/amplab/shark/wiki/Running-Shark-on-EC2
https://github.com/amplab/shark/wiki/Running-Shark-on-EC2
<https://github.com/amplab/shark/wiki/Running-Shark-on-EC2 <
https://github.com/amplab/shark/wiki/Running-Shark-on-EC2>>
> > <https://github.com/amplab/shark/wiki/Running-Shark-on-
EC2 https://github.com/amplab/shark/wiki/Running-Shark-on-EC2
> <https://github.com/amplab/shark/wiki/Running-Shark-on-EC2
https://github.com/amplab/shark/wiki/Running-Shark-on-EC2>> on an
ec2 cluster. I've
> > > copied the elasticsearch-hadoop jars into the hive lib
directory and I have elasticsearch running on localhost:9200. I'm
> > > running shark in a screen session with --service
screenserver and connecting to it at the same time using shark -h
> > > localhost.
> > >
> > > Unfortunately, when I attempt to write data into
elasticsearch, it fails. Here's an example:
> > >
> > > |
> > > [localhost:10000]shark>CREATE EXTERNAL TABLE wiki (id
BIGINT,title STRING,last_modified STRING,xml STRING,text
> > > STRING)ROW FORMAT DELIMITED FIELDS TERMINATED BY
'\t'LOCATION 's3n://spark-data/wikipedia-sample/';
> > > Timetaken (including network latency):0.159seconds
> > > 14/02/1901:23:33INFO CliDriver:Timetaken (including
network latency):0.159seconds
> > >
> > > [localhost:10000]shark>SELECT title FROM wiki LIMIT 1;
> > > Alpokalja
> > > Timetaken (including network latency):2.23seconds
> > > 14/02/1901:23:48INFO CliDriver:Timetaken (including
network latency):2.23seconds
> > >
> > > [localhost:10000]shark>CREATE EXTERNAL TABLE es_wiki
(id BIGINT,title STRING,last_modified STRING,xml STRING,text
> > > STRING)STORED BY 'org.elasticsearch.hadoop.
hive.EsStorageHandler'TBLPROPERTIES('es.resource'='wikipedia/article');
> > > Timetaken (including network latency):0.061seconds
> > > 14/02/1901:33:51INFO CliDriver:Timetaken (including
network latency):0.061seconds
> > >
> > > [localhost:10000]shark>INSERT OVERWRITE TABLE es_wiki
SELECTw.id http://w.id,w.title,w.last_modified,w.xml,w.text FROM wiki
w;
> > > [HiveError]:Queryreturned non-zero code:9,cause:FAILED:ExecutionError,returncode
-101fromshark.execution.SparkTask
> > > Timetaken (including network latency):3.575seconds
> > > 14/02/1901:34:42INFO CliDriver:Timetaken (including
network latency):3.575seconds
> > > |
> > >
> > > The stack trace looks like this:
> > >
> > > org.apache.hadoop.hive.ql.metadata.HiveException
(org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException:
> > > Out of nodes and retries; caught exception)
> > >
> > > org.apache.hadoop.hive.ql.exec.FileSinkOperator.
processOp(FileSinkOperator.java:602)shark.execution.
FileSinkOperator$$anonfun$processPartition$1.apply(
FileSinkOperator.scala:84)shark.execution.FileSinkOperator$$anonfun$
processPartition$1.apply(FileSinkOperator.scala:81)
scala.collection.Iterator$class.foreach(Iterator.scala:
772)scala.collection.Iterator$$anon$19.foreach(Iterator.
scala:399)shark.execution.FileSinkOperator.processPartition(
FileSinkOperator.scala:81)shark.execution.FileSinkOperator$.writeFiles$
1(FileSinkOperator.scala:207)shark.execution.FileSinkOperator$$anonfun$
executeProcessFileSinkPartition$1.apply(FileSinkOperator.
scala:211)shark.execution.FileSinkOperator$$anonfun$
executeProcessFileSinkPartition$1.apply(FileSinkOperator.
scala:211)org.apache.spark.scheduler.ResultTask.runTask(
ResultTask.scala:107)org.apache.spark.scheduler.Task.
run(Task.scala:53)org.apache.spark.executor.Executor$
TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:215)org.apac

he.spa

rk.dep
>
>     loy.Sp
>     >
>     >     arkHadoopUtil.runAsUser(SparkHadoopUtil.scala:50)org.

apache.spark.executor.Executor$TaskRunner.run(
Executor.scala:182)java.util.concurrent.ThreadPoolExecutor.
runWorker(ThreadPoolExecutor.java:1145)java.util.
concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
java:615)java.lang.Thread.run(Thread.java:744

>
>     >
>     >     > I should be using Hive 0.9.0, shark 0.8.1,

elasticsearch 1.0.0, Hadoop 1.0.4, and java 1.7.0_51
> > > Based on my cursory look at the hadoop and
elasticsearch-hadoop sources, it looks like hive is just rethrowing an
> > > IOException it's getting from Spark, and
elasticsearch-hadoop is just hitting those exceptions.
> > > I suppose my questions are: Does this look like an
issue with my ES/elasticsearch-hadoop config? And has anyone gotten
> > > elasticsearch working with Spark/Shark?
> > > Any ideas/insights are appreciated.
> > > Thanks,Max
> > >
> > > --
> > > You received this message because you are subscribed to
the Google Groups "elasticsearch" group.
> > > To unsubscribe from this group and stop receiving
emails from it, send an email to
> > >elasticsearc...@googlegroups.com <javascript:>.
> > > To view this discussion on the web visit
> > >https://groups.google.com/d/
msgid/elasticsearch/9486faff-3eaf-4344-8931-3121bbc5d9c7%
40googlegroups.com
<https://groups.google.com/d/msgid/elasticsearch/9486faff-
3eaf-4344-8931-3121bbc5d9c7%40googlegroups.com>
> <https://groups.google.com/d/msgid/elasticsearch/9486faff-
3eaf-4344-8931-3121bbc5d9c7%40googlegroups.com
<https://groups.google.com/d/msgid/elasticsearch/9486faff-
3eaf-4344-8931-3121bbc5d9c7%40googlegroups.com>>
> > <https://groups.google.com/d/
msgid/elasticsearch/9486faff-3eaf-4344-8931-3121bbc5d9c7%
40googlegroups.com
<https://groups.google.com/d/msgid/elasticsearch/9486faff-
3eaf-4344-8931-3121bbc5d9c7%40googlegroups.com>
> <https://groups.google.com/d/msgid/elasticsearch/9486faff-
3eaf-4344-8931-3121bbc5d9c7%40googlegroups.com
<https://groups.google.com/d/msgid/elasticsearch/9486faff-
3eaf-4344-8931-3121bbc5d9c7%40googlegroups.com>>>.
> > > For more options, visithttps://groups.google.
com/groups/opt_out http://groups.google.com/groups/opt_out <
http://groups.google.com/groups/opt_out

<http://groups.google.com/groups/opt_out>> <

https://groups.google.com/groups/opt_out
https://groups.google.com/groups/opt_out
> <https://groups.google.com/groups/opt_out <
https://groups.google.com/groups/opt_out>>>.
> >
> > --
> > Costin
> >
> > --
> > You received this message because you are subscribed to the
Google Groups "elasticsearch" group.
> > To unsubscribe from this group and stop receiving emails from
it, send an email to
> >elasticsearc...@googlegroups.com <javascript:>.
> > To view this discussion on the web visit
> >https://groups.google.com/d/msgid/elasticsearch/86187c3a-
0974-4d10-9689-e83da788c04a%40googlegroups.com
<https://groups.google.com/d/msgid/elasticsearch/86187c3a-
0974-4d10-9689-e83da788c04a%40googlegroups.com>
> <https://groups.google.com/d/msgid/elasticsearch/86187c3a-
0974-4d10-9689-e83da788c04a%40googlegroups.com
<https://groups.google.com/d/msgid/elasticsearch/86187c3a-
0974-4d10-9689-e83da788c04a%40googlegroups.com>>.
> > For more options, visithttps://groups.google.
com/groups/opt_out http://groups.google.com/groups/opt_out <
https://groups.google.com/groups/opt_out
https://groups.google.com/groups/opt_out>.
>
> --
> Costin
>
> --
> You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it,
send an email to
>elasticsearc...@googlegroups.com <javascript:>.
> To view this discussion on the web visit
>https://groups.google.com/d/msgid/elasticsearch/e29e342d-
de74-4ed6-93d4-875fc728c5a5%40googlegroups.com
<https://groups.google.com/d/msgid/elasticsearch/e29e342d-
de74-4ed6-93d4-875fc728c5a5%40googlegroups.com>.

> For more options, visithttps://groups.google.com/groups/opt_out <

https://groups.google.com/groups/opt_out>.

--
Costin

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to
elasticsearch+unsubscribe@googlegroups.com <mailto:elasticsearch+
unsubscribe@googlegroups.com>.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/c1081bf2-
117a-4af2-ba90-2c38a4572782%40googlegroups.com
<https://groups.google.com/d/msgid/elasticsearch/c1081bf2-
117a-4af2-ba90-2c38a4572782%40googlegroups.com?utm_medium=
email&utm_source=footer>.

For more options, visit https://groups.google.com/d/optout.

--
Costin

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/
topic/elasticsearch/S-BrzwUHJbM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/532AE2B5.8080004%40gmail.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALD%2B6GNJD0wMJPzXwQqvfL4%2B0nZmw4XzFrPdEc%2BOPLZVeNuZpw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Costin Leau) #10

Using the latest hive and hadoop is preferred as they contain various bug fixes.
The error suggests a classpath issue - namely the same class is loaded twice for some reason and hence the casting fails.

Let's connect on IRC - give me a ping when you're available (user is costin).

Cheers,

On 3/27/14 4:29 PM, Nick Pentreath wrote:

Thanks for the response.

I tried latest Shark (cdh4 version of 0.9.1 here http://cloudera.rst.im/shark/ ) - this uses hadoop 1.0.4 and hive 0.11
I believe, and build elasticsearch-hadoop from github master.

Still getting same error:
org.elasticsearch.hadoop.hive.EsHiveInputFormat$EsHiveSplit cannot be cast to
org.elasticsearch.hadoop.hive.EsHiveInputFormat$EsHiveSplit

Will using hive 0.11 / hadoop 1.0.4 vs hive 0.12 / hadoop 1.2.1 in es-hadoop master make a difference?

Anyone else actually got this working?

On Thu, Mar 20, 2014 at 2:44 PM, Costin Leau <costin.leau@gmail.com mailto:costin.leau@gmail.com> wrote:

I recommend using master - there are several improvements done in this area. Also using the latest Shark (0.9.0) and
Hive (0.12) will help.


On 3/20/14 12:00 PM, Nick Pentreath wrote:

    Hi

    I am struggling to get this working too. I'm just trying locally for now, running Shark 0.8.1, Hive 0.9.0 and ES
    1.0.1
    with ES-hadoop 1.3.0.M2.

    I managed to get a basic example working with WRITING into an index. But I'm really after READING and index.

    I believe I have set everything up correctly, I've added the jar to Shark:
    ADD JAR /path/to/es-hadoop.jar;

    created a table:
    CREATE EXTERNAL TABLE test_read (name string, price double)

    STORED BY 'org.elasticsearch.hadoop.__hive.EsStorageHandler'

    TBLPROPERTIES('es.resource' = 'test_index/test_type/_search?__q=*');


    And then trying to 'SELECT * FROM test _read' gives me :

    org.apache.spark.__SparkException: Job aborted: Task 3.0:0 failed more than 0 times; aborting job
    java.lang.ClassCastException: org.elasticsearch.hadoop.hive.__EsHiveInputFormat$ESHiveSplit cannot be cast to
    org.elasticsearch.hadoop.hive.__EsHiveInputFormat$ESHiveSplit

    at org.apache.spark.scheduler.__DAGScheduler$$anonfun$__abortStage$1.apply(__DAGScheduler.scala:827)

    at org.apache.spark.scheduler.__DAGScheduler$$anonfun$__abortStage$1.apply(__DAGScheduler.scala:825)

    at scala.collection.mutable.__ResizableArray$class.foreach(__ResizableArray.scala:60)

    at scala.collection.mutable.__ArrayBuffer.foreach(__ArrayBuffer.scala:47)

    at org.apache.spark.scheduler.__DAGScheduler.abortStage(__DAGScheduler.scala:825)

    at org.apache.spark.scheduler.__DAGScheduler.processEvent(__DAGScheduler.scala:440)

    at org.apache.spark.scheduler.__DAGScheduler.org
    <http://org.apache.spark.scheduler.DAGScheduler.org>$apache$spark$__scheduler$DAGScheduler$$run(__DAGScheduler.scala:502)

    at org.apache.spark.scheduler.__DAGScheduler$$anon$1.run(__DAGScheduler.scala:157)

    FAILED: Execution Error, return code -101 from shark.execution.SparkTask


    In fact I get the same error thrown when trying to READ from the table that I successfully WROTE to...

    On Saturday, 22 February 2014 12:31:21 UTC+2, Costin Leau wrote:

         Yeah, it might have been some sort of network configuration issue where services where running on different
    machines
         and
         localhost pointed to a different location.

         Either way, I'm glad to hear things have are moving forward.

         Cheers,

         On 22/02/2014 1:06 AM, Max Lang wrote:
         > I managed to get it working on ec2 without issue this time. I'd say the biggest difference was that this
    time I set up a
         > dedicated ES machine. Is it possible that, because I was using a cluster with slaves, when I used
    "localhost" the slaves
         > couldn't find the ES instance running on the master? Or do all the requests go through the master?
         >
         >
         > On Wednesday, February 19, 2014 2:35:40 PM UTC-8, Costin Leau wrote:
         >
         >     Hi,
         >
         >     Setting logging in Hive/Hadoop can be tricky since the log4j needs to be picked up by the running JVM
    otherwise you
         >     won't see anything.
         >     Take a look at this link on how to tell Hive to use your logging settings [1].
         >
         >     For the next release, we might introduce dedicated exceptions for the simple fact that some
    libraries, like Hive,
         >     swallow the stack trace and it's unclear what the issue is which makes the exception
    (IllegalStateException) ambiguous.
         >
         >     Let me know how it goes and whether you will encounter any issues with Shark. Or if you don't :)
         >
         >     Thanks!
         >
         >     [1]https://cwiki.apache.org/__confluence/display/Hive/__GettingStarted#GettingStarted-__ErrorLogs
    <https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-ErrorLogs>
         <https://cwiki.apache.org/__confluence/display/Hive/__GettingStarted#GettingStarted-__ErrorLogs
    <https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-ErrorLogs>>
         >     <https://cwiki.apache.org/__confluence/display/Hive/__GettingStarted#GettingStarted-__ErrorLogs
    <https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-ErrorLogs>
         <https://cwiki.apache.org/__confluence/display/Hive/__GettingStarted#GettingStarted-__ErrorLogs
    <https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-ErrorLogs>>>
         >
         >     On 20/02/2014 12:02 AM, Max Lang wrote:
         >     > Hey Costin,
         >     >
         >     > Thanks for the swift reply. I abandoned EC2 to take that out of the equation and managed to get
    everything working
         >     > locally using the latest version of everything (though I realized just now I'm still on hive 0.9).
    I'm guessing you're
         >     > right about some port connection issue because I definitely had ES running on that machine.
         >     >
         >     > I changed hive-log4j.properties and added
         >     > |
         >     > #custom logging levels
         >     > #log4j.logger.xxx=DEBUG
         >     > log4j.logger.org <http://log4j.logger.org>.__elasticsearch.hadoop.rest=__TRACE
         >     >log4j.logger.org.__elasticsearch.hadoop.mr <http://log4j.logger.org.elasticsearch.hadoop.mr>
    <http://log4j.logger.org.__elasticsearch.hadoop.mr <http://log4j.logger.org.elasticsearch.hadoop.mr>>
         <http://log4j.logger.org.__elasticsearch.hadoop.mr <http://log4j.logger.org.elasticsearch.hadoop.mr>
    <http://log4j.logger.org.__elasticsearch.hadoop.mr <http://log4j.logger.org.elasticsearch.hadoop.mr>>>=__TRACE

         >     > |
         >     >
         >     > But I didn't see any trace logging. Hopefully I can get it working on EC2 without issue, but, for
    the future, is this
         >     > the correct way to set TRACE logging?
         >     >
         >     > Oh and, for reference, I tried running without ES up and I got the following, exceptions:
         >     >
         >     > 2014-02-19 13:46:08,803 ERROR shark.SharkDriver (Logging.scala:logError(64)) - FAILED: Hive
    Internal Error:
         >     > java.lang.__IllegalStateException(Cannot discover Elasticsearch version)
         >     > java.lang.__IllegalStateException: Cannot discover Elasticsearch version
         >     > at org.elasticsearch.hadoop.hive.__EsStorageHandler.init(__EsStorageHandler.java:101)
         >     > at
    org.elasticsearch.hadoop.hive.__EsStorageHandler.__configureOutputJobProperties(__EsStorageHandler.java:83)
         >     > at
    org.apache.hadoop.hive.ql.__plan.PlanUtils.__configureJobPropertiesForStora__geHandler(PlanUtils.java:706)
         >     > at
    org.apache.hadoop.hive.ql.__plan.PlanUtils.__configureOutputJobPropertiesFo__rStorageHandler(PlanUtils.__java:675)
         >     > at org.apache.hadoop.hive.ql.__exec.FileSinkOperator.__augmentPlan(FileSinkOperator.__java:764)
         >     > at org.apache.hadoop.hive.ql.__parse.SemanticAnalyzer.__putOpInsertMap(__SemanticAnalyzer.java:1518)
         >     > at org.apache.hadoop.hive.ql.__parse.SemanticAnalyzer.__genFileSinkPlan(__SemanticAnalyzer.java:4337)
         >     > at
    org.apache.hadoop.hive.ql.__parse.SemanticAnalyzer.__genPostGroupByBodyPlan(__SemanticAnalyzer.java:6207)
         >     > at org.apache.hadoop.hive.ql.__parse.SemanticAnalyzer.__genBodyPlan(SemanticAnalyzer.__java:6138)
         >     > at org.apache.hadoop.hive.ql.__parse.SemanticAnalyzer.__genPlan(SemanticAnalyzer.java:__6764)
         >     > at shark.parse.__SharkSemanticAnalyzer.__analyzeInternal(__SharkSemanticAnalyzer.scala:__149)
         >     > at org.apache.hadoop.hive.ql.__parse.BaseSemanticAnalyzer.__analyze(BaseSemanticAnalyzer.__java:244)
         >     > at shark.SharkDriver.compile(__SharkDriver.scala:215)
         >     > at org.apache.hadoop.hive.ql.__Driver.compile(Driver.java:__336)
         >     > at org.apache.hadoop.hive.ql.__Driver.run(Driver.java:895)
         >     > at shark.SharkCliDriver.__processCmd(SharkCliDriver.__scala:324)
         >     > at org.apache.hadoop.hive.cli.__CliDriver.processLine(__CliDriver.java:406)
         >     > at shark.SharkCliDriver$.main(__SharkCliDriver.scala:232)
         >     > at shark.SharkCliDriver.main(__SharkCliDriver.scala)
         >     > Caused by: java.io.IOException: Out of nodes and retries; caught exception
         >     > at org.elasticsearch.hadoop.rest.__NetworkClient.execute(__NetworkClient.java:81)
         >     > at org.elasticsearch.hadoop.rest.__RestClient.execute(RestClient.__java:221)
         >     > at org.elasticsearch.hadoop.rest.__RestClient.execute(RestClient.__java:205)
         >     > at org.elasticsearch.hadoop.rest.__RestClient.execute(RestClient.__java:209)
         >     > at org.elasticsearch.hadoop.rest.__RestClient.get(RestClient.__java:103)
         >     > at org.elasticsearch.hadoop.rest.__RestClient.esVersion(__RestClient.java:274)
         >     > at
    org.elasticsearch.hadoop.rest.__InitializationUtils.__discoverEsVersion(__InitializationUtils.java:84)
         >     > at org.elasticsearch.hadoop.hive.__EsStorageHandler.init(__EsStorageHandler.java:99)
         >     > ... 18 more
         >     > Caused by: java.net.ConnectException: Connection refused
         >     > at java.net.PlainSocketImpl.__socketConnect(Native Method)
         >     > at java.net
    <http://java.net>.__AbstractPlainSocketImpl.__doConnect(__AbstractPlainSocketImpl.java:__339)
         >     > at java.net
    <http://java.net>.__AbstractPlainSocketImpl.__connectToAddress(__AbstractPlainSocketImpl.java:__200)
         >     > at java.net <http://java.net>.__AbstractPlainSocketImpl.__connect(__AbstractPlainSocketImpl.java:__182)
         >     > at java.net.SocksSocketImpl.__connect(SocksSocketImpl.java:__391)
         >     > at java.net.Socket.connect(__Socket.java:579)
         >     > at java.net.Socket.connect(__Socket.java:528)
         >     > at java.net.Socket.<init>(Socket.__java:425)
         >     > at java.net.Socket.<init>(Socket.__java:280)
         >     > at
    org.apache.commons.httpclient.__protocol.__DefaultProtocolSocketFactory.__createSocket(__DefaultProtocolSocketFactory.__java:80)
         >     > at
    org.apache.commons.httpclient.__protocol.__DefaultProtocolSocketFactory.__createSocket(__DefaultProtocolSocketFactory.__java:122)
         >     > at org.apache.commons.httpclient.__HttpConnection.open(__HttpConnection.java:707)
         >     > at org.apache.commons.httpclient.__HttpMethodDirector.__executeWithRetry(__HttpMethodDirector.java:387)
         >     > at org.apache.commons.httpclient.__HttpMethodDirector.__executeMethod(__HttpMethodDirector.java:171)
         >     > at org.apache.commons.httpclient.__HttpClient.executeMethod(__HttpClient.java:397)
         >     > at org.apache.commons.httpclient.__HttpClient.executeMethod(__HttpClient.java:323)
         >     > at
    org.elasticsearch.hadoop.rest.__commonshttp.__CommonsHttpTransport.execute(__CommonsHttpTransport.java:160)
         >     > at org.elasticsearch.hadoop.rest.__NetworkClient.execute(__NetworkClient.java:74)
         >     > ... 25 more
         >     >
         >     > Let me know if there's anything in particular you'd like me to try on EC2.
         >     >
         >     > (For posterity, the versions I used were: hadoop 2.2.0, hive 0.9.0, shark 8.1, spark 8.1, es-hadoop
    1.3.0.M2, java
         >     > 1.7.0_15, scala 2.9.3, elasticsearch 1.0.0)
         >     >
         >     > Thanks again,
         >     > Max
         >     >
         >     > On Tuesday, February 18, 2014 10:16:38 PM UTC-8, Costin Leau wrote:
         >     >
         >     >     The error indicates a network error - namely es-hadoop cannot connect to Elasticsearch on the
    default (localhost:9200)
         >     >     HTTP port. Can you double check whether that's indeed the case (using curl or even telnet on
    that port) - maybe the
         >     >     firewall prevents any connections to be made...
         >     >     Also you could try using the latest Hive, 0.12 and a more recent Hadoop such as 1.1.2 or 1.2.1.
         >     >
         >     >     Additionally, can you enable TRACE logging in your job on es-hadoop packages
    org.elasticsearch.hadoop.rest and
         >     >org.elasticsearch.hadoop.mr <http://org.elasticsearch.hadoop.mr>
    <http://org.elasticsearch.__hadoop.mr <http://org.elasticsearch.hadoop.mr>>
    <http://org.elasticsearch.__hadoop.mr <http://org.elasticsearch.hadoop.mr>
         <http://org.elasticsearch.__hadoop.mr <http://org.elasticsearch.hadoop.mr>>>
    <http://org.elasticsearch.__hadoop.mr <http://org.elasticsearch.hadoop.mr> <http://org.elasticsearch.__hadoop.mr
    <http://org.elasticsearch.hadoop.mr>>

         >     <http://org.elasticsearch.__hadoop.mr <http://org.elasticsearch.hadoop.mr>
    <http://org.elasticsearch.__hadoop.mr <http://org.elasticsearch.hadoop.mr>>>> packages and report back ?
         >     >
         >     >     Thanks,
         >     >
         >     >     On 19/02/2014 4:03 AM, Max Lang wrote:
         >     >     > I set everything up using this
    guide:https://github.com/__amplab/shark/wiki/Running-__Shark-on-EC2
    <https://github.com/amplab/shark/wiki/Running-Shark-on-EC2>
         <https://github.com/amplab/__shark/wiki/Running-Shark-on-__EC2
    <https://github.com/amplab/shark/wiki/Running-Shark-on-EC2>>
         <https://github.com/amplab/__shark/wiki/Running-Shark-on-__EC2
    <https://github.com/amplab/shark/wiki/Running-Shark-on-EC2>
    <https://github.com/amplab/__shark/wiki/Running-Shark-on-__EC2
    <https://github.com/amplab/shark/wiki/Running-Shark-on-EC2>>>
         >     >     <https://github.com/amplab/__shark/wiki/Running-Shark-on-__EC2
    <https://github.com/amplab/shark/wiki/Running-Shark-on-EC2>
    <https://github.com/amplab/__shark/wiki/Running-Shark-on-__EC2
    <https://github.com/amplab/shark/wiki/Running-Shark-on-EC2>>
         >     <https://github.com/amplab/__shark/wiki/Running-Shark-on-__EC2
    <https://github.com/amplab/shark/wiki/Running-Shark-on-EC2>
         <https://github.com/amplab/__shark/wiki/Running-Shark-on-__EC2
    <https://github.com/amplab/shark/wiki/Running-Shark-on-EC2>>>> on an ec2 cluster. I've
         >     >     > copied the elasticsearch-hadoop jars into the hive lib directory and I have elasticsearch
    running on localhost:9200. I'm
         >     >     > running shark in a screen session with --service screenserver and connecting to it at the
    same time using shark -h
         >     >     > localhost.
         >     >     >
         >     >     > Unfortunately, when I attempt to write data into elasticsearch, it fails. Here's an example:
         >     >     >
         >     >     > |
         >     >     > [localhost:10000]shark>CREATE EXTERNAL TABLE wiki (id BIGINT,title STRING,last_modified
    STRING,xml STRING,text
         >     >     > STRING)ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'LOCATION
    's3n://spark-data/wikipedia-__sample/';
         >     >     > Timetaken (including network latency):0.159seconds
         >     >     > 14/02/1901:23:33INFO CliDriver:Timetaken (including network latency):0.159seconds
         >     >     >
         >     >     > [localhost:10000]shark>SELECT title FROM wiki LIMIT 1;
         >     >     > Alpokalja
         >     >     > Timetaken (including network latency):2.23seconds
         >     >     > 14/02/1901:23:48INFO CliDriver:Timetaken (including network latency):2.23seconds
         >     >     >
         >     >     > [localhost:10000]shark>CREATE EXTERNAL TABLE es_wiki (id BIGINT,title STRING,last_modified
    STRING,xml STRING,text
         >     >     > STRING)STORED BY
    'org.elasticsearch.hadoop.__hive.EsStorageHandler'__TBLPROPERTIES('es.resource'='__wikipedia/article');
         >     >     > Timetaken (including network latency):0.061seconds
         >     >     > 14/02/1901:33:51INFO CliDriver:Timetaken (including network latency):0.061seconds
         >     >     >
         >     >     > [localhost:10000]shark>INSERT OVERWRITE TABLE es_wiki SELECTw.id
    <http://w.id>,w.title,w.last___modified,w.xml,w.text FROM wiki w;
         >     >     > [HiveError]:Queryreturned non-zero code:9,cause:FAILED:__ExecutionError,returncode
    -101fromshark.execution.__SparkTask
         >     >     > Timetaken (including network latency):3.575seconds
         >     >     > 14/02/1901:34:42INFO CliDriver:Timetaken (including network latency):3.575seconds
         >     >     > |
         >     >     >
         >     >     > *The stack trace looks like this:*
         >     >     >
         >     >     > org.apache.hadoop.hive.ql.__metadata.HiveException
    (org.apache.hadoop.hive.ql.__metadata.HiveException: java.io.IOException:
         >     >     > Out of nodes and retries; caught exception)
         >     >     >
         >     >     >
    org.apache.hadoop.hive.ql.__exec.FileSinkOperator.__processOp(FileSinkOperator.__java:602)shark.execution.__FileSinkOperator$$anonfun$__processPartition$1.apply(__FileSinkOperator.scala:84)__shark.execution.__FileSinkOperator$$anonfun$__processPartition$1.apply(__FileSinkOperator.scala:81)__scala.collection.Iterator$__class.foreach(Iterator.scala:__772)scala.collection.Iterator$__$anon$19.foreach(Iterator.__scala:399)shark.execution.__FileSinkOperator.__processPartition(__FileSinkOperator.scala:81)__shark.execution.__FileSinkOperator$.writeFiles$__1(FileSinkOperator.scala:207)__shark.execution.__FileSinkOperator$$anonfun$__executeProcessFileSinkPartitio__n$1.apply(FileSinkOperator.__scala:211)shark.execution.__FileSinkOperator$$anonfun$__executeProcessFileSinkPartitio__n$1.apply(FileSinkOperator.__scala:211)org.apache.spark.__scheduler.ResultTask.runTask(__ResultTask.scala:107)org.__apache.spark.scheduler.Task.__run(Task.scala:53)org.apache.__spark.executor.Executor$__Task

Runner$$anonfun$run$1.__apply$mcV$sp(Executor.scala:__215)org.apac

he.spa


         rk.dep
         >
         >     loy.Sp
         >     >
         >     >
    arkHadoopUtil.runAsUser(__SparkHadoopUtil.scala:50)org.__apache.spark.executor.__Executor$TaskRunner.run(__Executor.scala:182)java.util.__concurrent.ThreadPoolExecutor.__runWorker(ThreadPoolExecutor.__java:1145)java.util.__concurrent.ThreadPoolExecutor$__Worker.run(ThreadPoolExecutor.__java:615)java.lang.Thread.run(__Thread.java:744

         >
         >     >
         >     >     > I should be using Hive 0.9.0, shark 0.8.1, elasticsearch 1.0.0, Hadoop 1.0.4, and java 1.7.0_51
         >     >     > Based on my cursory look at the hadoop and elasticsearch-hadoop sources, it looks like hive
    is just rethrowing an
         >     >     > IOException it's getting from Spark, and elasticsearch-hadoop is just hitting those exceptions.
         >     >     > I suppose my questions are: Does this look like an issue with my ES/elasticsearch-hadoop
    config? And has anyone gotten
         >     >     > elasticsearch working with Spark/Shark?
         >     >     > Any ideas/insights are appreciated.
         >     >     > Thanks,Max
         >     >     >
         >     >     > --
         >     >     > You received this message because you are subscribed to the Google Groups "elasticsearch" group.
         >     >     > To unsubscribe from this group and stop receiving emails from it, send an email to
         >     >     >elasticsearc...@googlegroups.__com <mailto:elasticsearc...@googlegroups.com> <javascript:>.
         >     >     > To view this discussion on the web visit
         >     >
     >https://groups.google.com/d/__msgid/elasticsearch/9486faff-__3eaf-4344-8931-3121bbc5d9c7%__40googlegroups.com
    <https://groups.google.com/d/msgid/elasticsearch/9486faff-3eaf-4344-8931-3121bbc5d9c7%40googlegroups.com>

    <https://groups.google.com/d/__msgid/elasticsearch/9486faff-__3eaf-4344-8931-3121bbc5d9c7%__40googlegroups.com
    <https://groups.google.com/d/msgid/elasticsearch/9486faff-3eaf-4344-8931-3121bbc5d9c7%40googlegroups.com>>
         >
    <https://groups.google.com/d/__msgid/elasticsearch/9486faff-__3eaf-4344-8931-3121bbc5d9c7%__40googlegroups.com
    <https://groups.google.com/d/msgid/elasticsearch/9486faff-3eaf-4344-8931-3121bbc5d9c7%40googlegroups.com>

    <https://groups.google.com/d/__msgid/elasticsearch/9486faff-__3eaf-4344-8931-3121bbc5d9c7%__40googlegroups.com
    <https://groups.google.com/d/msgid/elasticsearch/9486faff-3eaf-4344-8931-3121bbc5d9c7%40googlegroups.com>>>
         >     >
    <https://groups.google.com/d/__msgid/elasticsearch/9486faff-__3eaf-4344-8931-3121bbc5d9c7%__40googlegroups.com
    <https://groups.google.com/d/msgid/elasticsearch/9486faff-3eaf-4344-8931-3121bbc5d9c7%40googlegroups.com>

    <https://groups.google.com/d/__msgid/elasticsearch/9486faff-__3eaf-4344-8931-3121bbc5d9c7%__40googlegroups.com
    <https://groups.google.com/d/msgid/elasticsearch/9486faff-3eaf-4344-8931-3121bbc5d9c7%40googlegroups.com>>
         >
    <https://groups.google.com/d/__msgid/elasticsearch/9486faff-__3eaf-4344-8931-3121bbc5d9c7%__40googlegroups.com
    <https://groups.google.com/d/msgid/elasticsearch/9486faff-3eaf-4344-8931-3121bbc5d9c7%40googlegroups.com>

    <https://groups.google.com/d/__msgid/elasticsearch/9486faff-__3eaf-4344-8931-3121bbc5d9c7%__40googlegroups.com
    <https://groups.google.com/d/msgid/elasticsearch/9486faff-3eaf-4344-8931-3121bbc5d9c7%40googlegroups.com>>>>.
         >     >     > For more options, visithttps://groups.google.__com/groups/opt_out
    <http://groups.google.com/groups/opt_out> <http://groups.google.com/__groups/opt_out
    <http://groups.google.com/groups/opt_out>> <http://groups.google.com/__groups/opt_out
    <http://groups.google.com/groups/opt_out>

         <http://groups.google.com/__groups/opt_out <http://groups.google.com/groups/opt_out>>>
    <https://groups.google.com/__groups/opt_out <https://groups.google.com/groups/opt_out>
         <https://groups.google.com/__groups/opt_out <https://groups.google.com/groups/opt_out>>
         >     <https://groups.google.com/__groups/opt_out <https://groups.google.com/groups/opt_out>
    <https://groups.google.com/__groups/opt_out <https://groups.google.com/groups/opt_out>>>>.
         >     >
         >     >     --
         >     >     Costin
         >     >
         >     > --
         >     > You received this message because you are subscribed to the Google Groups "elasticsearch" group.
         >     > To unsubscribe from this group and stop receiving emails from it, send an email to
         >     >elasticsearc...@googlegroups.__com <mailto:elasticsearc...@googlegroups.com> <javascript:>.
         >     > To view this discussion on the web visit
         >
     >https://groups.google.com/d/__msgid/elasticsearch/86187c3a-__0974-4d10-9689-e83da788c04a%__40googlegroups.com
    <https://groups.google.com/d/msgid/elasticsearch/86187c3a-0974-4d10-9689-e83da788c04a%40googlegroups.com>

    <https://groups.google.com/d/__msgid/elasticsearch/86187c3a-__0974-4d10-9689-e83da788c04a%__40googlegroups.com
    <https://groups.google.com/d/msgid/elasticsearch/86187c3a-0974-4d10-9689-e83da788c04a%40googlegroups.com>>
         >
    <https://groups.google.com/d/__msgid/elasticsearch/86187c3a-__0974-4d10-9689-e83da788c04a%__40googlegroups.com
    <https://groups.google.com/d/msgid/elasticsearch/86187c3a-0974-4d10-9689-e83da788c04a%40googlegroups.com>

    <https://groups.google.com/d/__msgid/elasticsearch/86187c3a-__0974-4d10-9689-e83da788c04a%__40googlegroups.com
    <https://groups.google.com/d/msgid/elasticsearch/86187c3a-0974-4d10-9689-e83da788c04a%40googlegroups.com>>>.
         >     > For more options, visithttps://groups.google.__com/groups/opt_out
    <http://groups.google.com/groups/opt_out> <http://groups.google.com/__groups/opt_out
    <http://groups.google.com/groups/opt_out>> <https://groups.google.com/__groups/opt_out
    <https://groups.google.com/groups/opt_out>
         <https://groups.google.com/__groups/opt_out <https://groups.google.com/groups/opt_out>>>.
         >
         >     --
         >     Costin
         >
         > --
         > You received this message because you are subscribed to the Google Groups "elasticsearch" group.
         > To unsubscribe from this group and stop receiving emails from it, send an email to
         >elasticsearc...@googlegroups.__com <mailto:elasticsearc...@googlegroups.com> <javascript:>.
         > To view this discussion on the web visit

     >https://groups.google.com/d/__msgid/elasticsearch/e29e342d-__de74-4ed6-93d4-875fc728c5a5%__40googlegroups.com
    <https://groups.google.com/d/msgid/elasticsearch/e29e342d-de74-4ed6-93d4-875fc728c5a5%40googlegroups.com>

    <https://groups.google.com/d/__msgid/elasticsearch/e29e342d-__de74-4ed6-93d4-875fc728c5a5%__40googlegroups.com
    <https://groups.google.com/d/msgid/elasticsearch/e29e342d-de74-4ed6-93d4-875fc728c5a5%40googlegroups.com>>.

         > For more options, visithttps://groups.google.__com/groups/opt_out
    <http://groups.google.com/groups/opt_out> <https://groups.google.com/__groups/opt_out
    <https://groups.google.com/groups/opt_out>>.

         --
         Costin

    --
    You received this message because you are subscribed to the Google Groups "elasticsearch" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to
    elasticsearch+unsubscribe@__googlegroups.com <mailto:elasticsearch%2Bunsubscribe@googlegroups.com>
    <mailto:elasticsearch+__unsubscribe@googlegroups.com <mailto:elasticsearch%2Bunsubscribe@googlegroups.com>>.

    To view this discussion on the web visit
    https://groups.google.com/d/__msgid/elasticsearch/c1081bf2-__117a-4af2-ba90-2c38a4572782%__40googlegroups.com
    <https://groups.google.com/d/msgid/elasticsearch/c1081bf2-117a-4af2-ba90-2c38a4572782%40googlegroups.com>
    <https://groups.google.com/d/__msgid/elasticsearch/c1081bf2-__117a-4af2-ba90-2c38a4572782%__40googlegroups.com?utm_medium=__email&utm_source=footer
    <https://groups.google.com/d/msgid/elasticsearch/c1081bf2-117a-4af2-ba90-2c38a4572782%40googlegroups.com?utm_medium=email&utm_source=footer>>.

    For more options, visit https://groups.google.com/d/__optout <https://groups.google.com/d/optout>.


--
Costin

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/__topic/elasticsearch/S-__BrzwUHJbM/unsubscribe
<https://groups.google.com/d/topic/elasticsearch/S-BrzwUHJbM/unsubscribe>.
To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscribe@__googlegroups.com
<mailto:elasticsearch%2Bunsubscribe@googlegroups.com>.
To view this discussion on the web visit
https://groups.google.com/d/__msgid/elasticsearch/532AE2B5.__8080004%40gmail.com
<https://groups.google.com/d/msgid/elasticsearch/532AE2B5.8080004%40gmail.com>.

For more options, visit https://groups.google.com/d/__optout <https://groups.google.com/d/optout>.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
elasticsearch+unsubscribe@googlegroups.com mailto:elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CALD%2B6GNJD0wMJPzXwQqvfL4%2B0nZmw4XzFrPdEc%2BOPLZVeNuZpw%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CALD%2B6GNJD0wMJPzXwQqvfL4%2B0nZmw4XzFrPdEc%2BOPLZVeNuZpw%40mail.gmail.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout.

--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/53343AA6.1000405%40gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Nick Pentreath) #11

Hi Costin

Sorry for the silence on this issue. This went a bit quiet.

But the good news is I've come back to it and managed to get it all working
with the new shark 0.9.1 release and 2.0.0RC1. Actually if I used ADD JAR I
got the same exception but when I just put the JAR into the shark lib/
folder it worked fine (which seems to point to the classpath issue you
mention).

However, I seem to have an issue with date <-> timestamp conversion.

I have a field in ES called "_ts" that has type "date" and the default
format "dateOptionalTime". When I do a query that includes the timestamp it
comes back NULL:

select ts from table ...
(note I use a correct es.mapping.names to map the _ts field in ES to ts
field in Hive/Shark that has timestamp type).

below is some of the debug-level output:

14/05/13 19:19:47 DEBUG lazy.LazyPrimitive: Data not in the TIMESTAMP data
type range so converted to null. Given data is :96997506-06-30
19:08:168:16.768
14/05/13 19:19:47 DEBUG lazy.LazyPrimitive: Data not in the TIMESTAMP data
type range so converted to null. Given data is :96997605-06-28
19:08:168:16.768
14/05/13 19:19:47 DEBUG lazy.LazyPrimitive: Data not in the TIMESTAMP data
type range so converted to null. Given data is :96997624-06-28
19:08:168:16.768
14/05/13 19:19:47 DEBUG lazy.LazyPrimitive: Data not in the TIMESTAMP data
type range so converted to null. Given data is :96997629-06-28
19:08:168:16.768
14/05/13 19:19:47 DEBUG lazy.LazyPrimitive: Data not in the TIMESTAMP data
type range so converted to null. Given data is :96997634-06-29
19:08:168:16.768
NULL
NULL
NULL
NULL
NULL

The data that I index in the _ts field is timestamp in ms (long). It
doesn't seem to be converted correctly but the data is correct (in ms at
least) and I can query against it using date formats and date math in ES.

Example snippet from debug log from above:
,"_ts":1397130475607}}]}}"

Any ideas or am I doing something silly?

I do see that the Hive timestamp expects either seconds since epoch of a
string-based format that has nanosecond granularity. Is this the issue with
just ms long timestamp data?

Thanks
Nick

On Thu, Mar 27, 2014 at 4:50 PM, Costin Leau costin.leau@gmail.com wrote:

Using the latest hive and hadoop is preferred as they contain various bug
fixes.
The error suggests a classpath issue - namely the same class is loaded
twice for some reason and hence the casting fails.

Let's connect on IRC - give me a ping when you're available (user is
costin).

Cheers,

On 3/27/14 4:29 PM, Nick Pentreath wrote:

Thanks for the response.

I tried latest Shark (cdh4 version of 0.9.1 here
http://cloudera.rst.im/shark/ ) - this uses hadoop 1.0.4 and hive 0.11
I believe, and build elasticsearch-hadoop from github master.

Still getting same error:
org.elasticsearch.hadoop.hive.EsHiveInputFormat$EsHiveSplit cannot be
cast to
org.elasticsearch.hadoop.hive.EsHiveInputFormat$EsHiveSplit

Will using hive 0.11 / hadoop 1.0.4 vs hive 0.12 / hadoop 1.2.1 in
es-hadoop master make a difference?

Anyone else actually got this working?

On Thu, Mar 20, 2014 at 2:44 PM, Costin Leau <costin.leau@gmail.com<mailto:
costin.leau@gmail.com>> wrote:

I recommend using master - there are several improvements done in

this area. Also using the latest Shark (0.9.0) and
Hive (0.12) will help.

On 3/20/14 12:00 PM, Nick Pentreath wrote:

    Hi

    I am struggling to get this working too. I'm just trying locally

for now, running Shark 0.8.1, Hive 0.9.0 and ES
1.0.1
with ES-hadoop 1.3.0.M2.

    I managed to get a basic example working with WRITING into an

index. But I'm really after READING and index.

    I believe I have set everything up correctly, I've added the jar

to Shark:
ADD JAR /path/to/es-hadoop.jar;

    created a table:
    CREATE EXTERNAL TABLE test_read (name string, price double)

    STORED BY 'org.elasticsearch.hadoop.__hive.EsStorageHandler'

    TBLPROPERTIES('es.resource' = 'test_index/test_type/_search?

__q=*');

    And then trying to 'SELECT * FROM test _read' gives me :

    org.apache.spark.__SparkException: Job aborted: Task 3.0:0

failed more than 0 times; aborting job
java.lang.ClassCastException: org.elasticsearch.hadoop.hive.
__EsHiveInputFormat$ESHiveSplit cannot be cast to
org.elasticsearch.hadoop.hive.__EsHiveInputFormat$ESHiveSplit

    at org.apache.spark.scheduler.__DAGScheduler$$anonfun$__

abortStage$1.apply(__DAGScheduler.scala:827)

    at org.apache.spark.scheduler.__DAGScheduler$$anonfun$__

abortStage$1.apply(__DAGScheduler.scala:825)

    at scala.collection.mutable.__ResizableArray$class.foreach(_

_ResizableArray.scala:60)

    at scala.collection.mutable.__ArrayBuffer.foreach(__

ArrayBuffer.scala:47)

    at org.apache.spark.scheduler.__DAGScheduler.abortStage(__

DAGScheduler.scala:825)

    at org.apache.spark.scheduler.__DAGScheduler.processEvent(__

DAGScheduler.scala:440)

    at org.apache.spark.scheduler.__DAGScheduler.org
    <http://org.apache.spark.scheduler.DAGScheduler.org>$

apache$spark$__scheduler$DAGScheduler$$run(__DAGScheduler.scala:502)

    at org.apache.spark.scheduler.__DAGScheduler$$anon$1.run(__

DAGScheduler.scala:157)

    FAILED: Execution Error, return code -101 from

shark.execution.SparkTask

    In fact I get the same error thrown when trying to READ from the

table that I successfully WROTE to...

    On Saturday, 22 February 2014 12:31:21 UTC+2, Costin Leau wrote:

         Yeah, it might have been some sort of network configuration

issue where services where running on different
machines
and
localhost pointed to a different location.

         Either way, I'm glad to hear things have are moving forward.

         Cheers,

         On 22/02/2014 1:06 AM, Max Lang wrote:
         > I managed to get it working on ec2 without issue this

time. I'd say the biggest difference was that this
time I set up a
> dedicated ES machine. Is it possible that, because I was
using a cluster with slaves, when I used
"localhost" the slaves
> couldn't find the ES instance running on the master? Or do
all the requests go through the master?
>
>
> On Wednesday, February 19, 2014 2:35:40 PM UTC-8, Costin
Leau wrote:
>
> Hi,
>
> Setting logging in Hive/Hadoop can be tricky since the
log4j needs to be picked up by the running JVM
otherwise you
> won't see anything.
> Take a look at this link on how to tell Hive to use
your logging settings [1].
>
> For the next release, we might introduce dedicated
exceptions for the simple fact that some
libraries, like Hive,
> swallow the stack trace and it's unclear what the
issue is which makes the exception
(IllegalStateException) ambiguous.
>
> Let me know how it goes and whether you will encounter
any issues with Shark. Or if you don't :slight_smile:
>
> Thanks!
>
> [1]https://cwiki.apache.org/__
confluence/display/Hive/__GettingStarted#GettingStarted-__ErrorLogs
<https://cwiki.apache.org/confluence/display/Hive/
GettingStarted#GettingStarted-ErrorLogs>
<https://cwiki.apache.org/confluence/display/Hive/
GettingStarted#GettingStarted-__ErrorLogs
<https://cwiki.apache.org/confluence/display/Hive/
GettingStarted#GettingStarted-ErrorLogs>>
> <https://cwiki.apache.org/confluence/display/Hive/
GettingStarted#GettingStarted-__ErrorLogs
<https://cwiki.apache.org/confluence/display/Hive/
GettingStarted#GettingStarted-ErrorLogs>
<https://cwiki.apache.org/confluence/display/Hive/
GettingStarted#GettingStarted-__ErrorLogs

    <https://cwiki.apache.org/confluence/display/Hive/

GettingStarted#GettingStarted-ErrorLogs>>>
>
> On 20/02/2014 12:02 AM, Max Lang wrote:
> > Hey Costin,
> >
> > Thanks for the swift reply. I abandoned EC2 to take
that out of the equation and managed to get
everything working
> > locally using the latest version of everything
(though I realized just now I'm still on hive 0.9).
I'm guessing you're
> > right about some port connection issue because I
definitely had ES running on that machine.
> >
> > I changed hive-log4j.properties and added
> > |
> > #custom logging levels
> > #log4j.logger.xxx=DEBUG
> > log4j.logger.org http://log4j.logger.org.__
elasticsearch.hadoop.rest=__TRACE
> >log4j.logger.org.__elasticsearch.hadoop.mr <
http://log4j.logger.org.elasticsearch.hadoop.mr>
<http://log4j.logger.org.__elasticsearch.hadoop.mr <
http://log4j.logger.org.elasticsearch.hadoop.mr>>
<http://log4j.logger.org.__elasticsearch.hadoop.mr <
http://log4j.logger.org.elasticsearch.hadoop.mr>
<http://log4j.logger.org.__elasticsearch.hadoop.mr <
http://log4j.logger.org.elasticsearch.hadoop.mr>>>=__TRACE

         >     > |
         >     >
         >     > But I didn't see any trace logging. Hopefully I can

get it working on EC2 without issue, but, for
the future, is this
> > the correct way to set TRACE logging?
> >
> > Oh and, for reference, I tried running without ES up
and I got the following, exceptions:
> >
> > 2014-02-19 13:46:08,803 ERROR shark.SharkDriver
(Logging.scala:logError(64)) - FAILED: Hive
Internal Error:
> > java.lang.__IllegalStateException(Cannot discover
Elasticsearch version)
> > java.lang.__IllegalStateException: Cannot discover
Elasticsearch version
> > at org.elasticsearch.hadoop.hive.
__EsStorageHandler.init(__EsStorageHandler.java:101)
> > at
org.elasticsearch.hadoop.hive.EsStorageHandler.
configureOutputJobProperties(__EsStorageHandler.java:83)
> > at
org.apache.hadoop.hive.ql.plan.PlanUtils.
configureJobPropertiesForStora__geHandler(PlanUtils.java:706)
> > at
org.apache.hadoop.hive.ql.plan.PlanUtils.
configureOutputJobPropertiesFo__rStorageHandler(PlanUtils.java:675)
> > at org.apache.hadoop.hive.ql.

exec.FileSinkOperator.__augmentPlan(FileSinkOperator.java:764)
> > at org.apache.hadoop.hive.ql.

parse.SemanticAnalyzer.__putOpInsertMap(SemanticAnalyzer.java:1518)
> > at org.apache.hadoop.hive.ql.

parse.SemanticAnalyzer.__genFileSinkPlan(__SemanticAnalyzer.java:4337)
> > at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.
genPostGroupByBodyPlan(SemanticAnalyzer.java:6207)
> > at org.apache.hadoop.hive.ql.

parse.SemanticAnalyzer.__genBodyPlan(SemanticAnalyzer.java:6138)
> > at org.apache.hadoop.hive.ql.

parse.SemanticAnalyzer.__genPlan(SemanticAnalyzer.java:__6764)
> > at shark.parse.SharkSemanticAnalyzer.
analyzeInternal(SharkSemanticAnalyzer.scala:149)
> > at org.apache.hadoop.hive.ql.

parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:244)
> > at shark.SharkDriver.compile(

SharkDriver.scala:215)
> > at org.apache.hadoop.hive.ql.

Driver.compile(Driver.java:336)
> > at org.apache.hadoop.hive.ql.

Driver.run(Driver.java:895)
> > at shark.SharkCliDriver.

processCmd(SharkCliDriver.scala:324)
> > at org.apache.hadoop.hive.cli.

CliDriver.processLine(CliDriver.java:406)
> > at shark.SharkCliDriver$.main(

SharkCliDriver.scala:232)
> > at shark.SharkCliDriver.main(__SharkCliDriver.scala)

         >     > Caused by: java.io.IOException: Out of nodes and

retries; caught exception
> > at org.elasticsearch.hadoop.rest.
__NetworkClient.execute(__NetworkClient.java:81)
> > at org.elasticsearch.hadoop.rest.
__RestClient.execute(RestClient.__java:221)
> > at org.elasticsearch.hadoop.rest.
__RestClient.execute(RestClient.__java:205)
> > at org.elasticsearch.hadoop.rest.
__RestClient.execute(RestClient.__java:209)
> > at org.elasticsearch.hadoop.rest.
__RestClient.get(RestClient.__java:103)
> > at org.elasticsearch.hadoop.rest.
__RestClient.esVersion(__RestClient.java:274)
> > at
org.elasticsearch.hadoop.rest.InitializationUtils.
discoverEsVersion(__InitializationUtils.java:84)
> > at org.elasticsearch.hadoop.hive.
__EsStorageHandler.init(__EsStorageHandler.java:99)

         >     > ... 18 more
         >     > Caused by: java.net.ConnectException: Connection

refused
> > at java.net.PlainSocketImpl.__socketConnect(Native
Method)
> > at java.net
http://java.net.__AbstractPlainSocketImpl.doConnect(
AbstractPlainSocketImpl.java:__339)
> > at java.net
http://java.net.__AbstractPlainSocketImpl.connectToAddress(
AbstractPlainSocketImpl.java:200)
> > at java.net http://java.net.

AbstractPlainSocketImpl.__connect(__AbstractPlainSocketImpl.java:182)
> > at java.net.SocksSocketImpl.

connect(SocksSocketImpl.java:__391)
> > at java.net.Socket.connect(__Socket.java:579)
> > at java.net.Socket.connect(__Socket.java:528)
> > at java.net.Socket.(Socket.__java:425)
> > at java.net.Socket.(Socket.__java:280)
> > at
org.apache.commons.httpclient.protocol.
DefaultProtocolSocketFactory.createSocket(
DefaultProtocolSocketFactory.__java:80)
> > at
org.apache.commons.httpclient.protocol.
DefaultProtocolSocketFactory.createSocket(
DefaultProtocolSocketFactory.__java:122)
> > at org.apache.commons.httpclient.
__HttpConnection.open(__HttpConnection.java:707)
> > at org.apache.commons.httpclient.
__HttpMethodDirector.__executeWithRetry(__HttpMethodDirector.java:387)
> > at org.apache.commons.httpclient.
__HttpMethodDirector.__executeMethod(__HttpMethodDirector.java:171)
> > at org.apache.commons.httpclient.
__HttpClient.executeMethod(__HttpClient.java:397)
> > at org.apache.commons.httpclient.
__HttpClient.executeMethod(__HttpClient.java:323)
> > at
org.elasticsearch.hadoop.rest.commonshttp.
CommonsHttpTransport.execute(__CommonsHttpTransport.java:160)
> > at org.elasticsearch.hadoop.rest.
__NetworkClient.execute(__NetworkClient.java:74)

         >     > ... 25 more
         >     >
         >     > Let me know if there's anything in particular you'd

like me to try on EC2.
> >
> > (For posterity, the versions I used were: hadoop
2.2.0, hive 0.9.0, shark 8.1, spark 8.1, es-hadoop
1.3.0.M2, java
> > 1.7.0_15, scala 2.9.3, elasticsearch 1.0.0)
> >
> > Thanks again,
> > Max
> >
> > On Tuesday, February 18, 2014 10:16:38 PM UTC-8,
Costin Leau wrote:
> >
> > The error indicates a network error - namely
es-hadoop cannot connect to Elasticsearch on the
default (localhost:9200)
> > HTTP port. Can you double check whether that's
indeed the case (using curl or even telnet on
that port) - maybe the
> > firewall prevents any connections to be made...
> > Also you could try using the latest Hive, 0.12
and a more recent Hadoop such as 1.1.2 or 1.2.1.
> >
> > Additionally, can you enable TRACE logging in
your job on es-hadoop packages
org.elasticsearch.hadoop.rest and
> >org.elasticsearch.hadoop.mr <
http://org.elasticsearch.hadoop.mr>
<http://org.elasticsearch.__hadoop.mr <http://org.elasticsearch.
hadoop.mr>>
<http://org.elasticsearch.__hadoop.mr <http://org.elasticsearch.
hadoop.mr>
<http://org.elasticsearch.__hadoop.mr <
http://org.elasticsearch.hadoop.mr>>>
<http://org.elasticsearch.__hadoop.mr <http://org.elasticsearch.
hadoop.mr> <http://org.elasticsearch.__hadoop.mr
http://org.elasticsearch.hadoop.mr>

         >     <http://org.elasticsearch.__hadoop.mr <

http://org.elasticsearch.hadoop.mr>
<http://org.elasticsearch.__hadoop.mr <http://org.elasticsearch.
hadoop.mr>>>> packages and report back ?

         >     >
         >     >     Thanks,
         >     >
         >     >     On 19/02/2014 4:03 AM, Max Lang wrote:
         >     >     > I set everything up using this
    guide:https://github.com/__amplab/shark/wiki/Running-__

Shark-on-EC2
https://github.com/amplab/shark/wiki/Running-Shark-on-EC2
<https://github.com/amplab/shark/wiki/Running-Shark-on-
EC2
https://github.com/amplab/shark/wiki/Running-Shark-on-EC2>
<https://github.com/amplab/shark/wiki/Running-Shark-on-
EC2
https://github.com/amplab/shark/wiki/Running-Shark-on-EC2
<https://github.com/amplab/__shark/wiki/Running-Shark-on-EC2
https://github.com/amplab/shark/wiki/Running-Shark-on-EC2>>
> > <https://github.com/amplab/

shark/wiki/Running-Shark-on-__EC2
https://github.com/amplab/shark/wiki/Running-Shark-on-EC2
<https://github.com/amplab/__shark/wiki/Running-Shark-on-EC2
https://github.com/amplab/shark/wiki/Running-Shark-on-EC2>
> <https://github.com/amplab/

shark/wiki/Running-Shark-on-__EC2
https://github.com/amplab/shark/wiki/Running-Shark-on-EC2
<https://github.com/amplab/shark/wiki/Running-Shark-on-
EC2

    <https://github.com/amplab/shark/wiki/Running-Shark-on-EC2>>>>

on an ec2 cluster. I've
> > > copied the elasticsearch-hadoop jars into the
hive lib directory and I have elasticsearch
running on localhost:9200. I'm
> > > running shark in a screen session with
--service screenserver and connecting to it at the
same time using shark -h
> > > localhost.
> > >
> > > Unfortunately, when I attempt to write data
into elasticsearch, it fails. Here's an example:
> > >
> > > |
> > > [localhost:10000]shark>CREATE EXTERNAL TABLE
wiki (id BIGINT,title STRING,last_modified
STRING,xml STRING,text
> > > STRING)ROW FORMAT DELIMITED FIELDS TERMINATED
BY '\t'LOCATION
's3n://spark-data/wikipedia-__sample/';

         >     >     > Timetaken (including network

latency):0.159seconds
> > > 14/02/1901:23:33INFO CliDriver:Timetaken
(including network latency):0.159seconds
> > >
> > > [localhost:10000]shark>SELECT title FROM wiki
LIMIT 1;
> > > Alpokalja
> > > Timetaken (including network
latency):2.23seconds
> > > 14/02/1901:23:48INFO CliDriver:Timetaken
(including network latency):2.23seconds
> > >
> > > [localhost:10000]shark>CREATE EXTERNAL TABLE
es_wiki (id BIGINT,title STRING,last_modified
STRING,xml STRING,text
> > > STRING)STORED BY
'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.resource'='__wikipedia/article');

         >     >     > Timetaken (including network

latency):0.061seconds
> > > 14/02/1901:33:51INFO CliDriver:Timetaken
(including network latency):0.061seconds
> > >
> > > [localhost:10000]shark>INSERT OVERWRITE TABLE
es_wiki SELECTw.id
http://w.id,w.title,w.last___modified,w.xml,w.text FROM wiki w;
> > > [HiveError]:Queryreturned non-zero
code:9,cause:FAILED:__ExecutionError,returncode
-101fromshark.execution.__SparkTask

         >     >     > Timetaken (including network

latency):3.575seconds
> > > 14/02/1901:34:42INFO CliDriver:Timetaken
(including network latency):3.575seconds
> > > |
> > >
> > > The stack trace looks like this:
> > >
> > > org.apache.hadoop.hive.ql.__
metadata.HiveException
(org.apache.hadoop.hive.ql.__metadata.HiveException:
java.io.IOException:

         >     >     > Out of nodes and retries; caught exception)
         >     >     >
         >     >     >
    org.apache.hadoop.hive.ql.__exec.FileSinkOperator.__

processOp(FileSinkOperator.java:602)shark.execution.
FileSinkOperator$$anonfun$processPartition$1.apply(
FileSinkOperator.scala:84)shark.execution.
FileSinkOperator$$anonfun$processPartition$1.apply(
FileSinkOperator.scala:81)scala.collection.Iterator$
class.foreach(Iterator.scala:772)scala.collection.
Iterator$
$anon$19.foreach(Iterator.__scala:399)shark.
execution.__FileSinkOperator.processPartition(
FileSinkOperator.scala:81)shark.execution.
FileSinkOperator$.writeFiles$__1(FileSinkOperator.scala:207)
_shark.execution.FileSinkOperator$$anonfun$
executeProcessFileSinkPartitio__n$1.apply(FileSinkOperator.

scala:211)shark.execution.FileSinkOperator$$anonfun$
executeProcessFileSinkPartitio__n$1.apply(FileSinkOperator.

_scala:211)org.apache.spark._scheduler.ResultTask.runTask(
_ResultTask.scala:107)org.apache.spark.scheduler.Task.
run(Task.scala:53)org.apache.__spark.executor.Executor$__Task

Runner$$anonfun$run$1.__apply$mcV$sp(Executor.scala:__215)org.apac

he.spa


         rk.dep
         >
         >     loy.Sp
         >     >
         >     >
    arkHadoopUtil.runAsUser(__SparkHadoopUtil.scala:50)org._

_apache.spark.executor.Executor$TaskRunner.run(
Executor.scala:182)java.util.concurrent.ThreadPoolExecutor.
runWorker(ThreadPoolExecutor.java:1145)java.util.
concurrent.ThreadPoolExecutor$__Worker.run(ThreadPoolExecutor.__java:615)
java.lang.Thread.run(__Thread.java:744

         >
         >     >
         >     >     > I should be using Hive 0.9.0, shark 0.8.1,

elasticsearch 1.0.0, Hadoop 1.0.4, and java 1.7.0_51
> > > Based on my cursory look at the hadoop and
elasticsearch-hadoop sources, it looks like hive
is just rethrowing an
> > > IOException it's getting from Spark, and
elasticsearch-hadoop is just hitting those exceptions.
> > > I suppose my questions are: Does this look
like an issue with my ES/elasticsearch-hadoop
config? And has anyone gotten
> > > elasticsearch working with Spark/Shark?
> > > Any ideas/insights are appreciated.
> > > Thanks,Max
> > >
> > > --
> > > You received this message because you are
subscribed to the Google Groups "elasticsearch" group.
> > > To unsubscribe from this group and stop
receiving emails from it, send an email to
> > >elasticsearc...@googlegroups.__com <mailto:
elasticsearc...@googlegroups.com> <javascript:>.

         >     >     > To view this discussion on the web visit
         >     >
     >https://groups.google.com/d/__msgid/elasticsearch/9486faff-

__3eaf-4344-8931-3121bbc5d9c7%__40googlegroups.com
<https://groups.google.com/d/msgid/elasticsearch/9486faff-
3eaf-4344-8931-3121bbc5d9c7%40googlegroups.com>

    <https://groups.google.com/d/__msgid/elasticsearch/9486faff-

__3eaf-4344-8931-3121bbc5d9c7%__40googlegroups.com
<https://groups.google.com/d/msgid/elasticsearch/9486faff-
3eaf-4344-8931-3121bbc5d9c7%40googlegroups.com>>
>
<https://groups.google.com/d/__msgid/elasticsearch/9486faff-
__3eaf-4344-8931-3121bbc5d9c7%__40googlegroups.com
<https://groups.google.com/d/msgid/elasticsearch/9486faff-
3eaf-4344-8931-3121bbc5d9c7%40googlegroups.com>

    <https://groups.google.com/d/__msgid/elasticsearch/9486faff-

__3eaf-4344-8931-3121bbc5d9c7%__40googlegroups.com
<https://groups.google.com/d/msgid/elasticsearch/9486faff-
3eaf-4344-8931-3121bbc5d9c7%40googlegroups.com>>>
> >
<https://groups.google.com/d/__msgid/elasticsearch/9486faff-
__3eaf-4344-8931-3121bbc5d9c7%__40googlegroups.com
<https://groups.google.com/d/msgid/elasticsearch/9486faff-
3eaf-4344-8931-3121bbc5d9c7%40googlegroups.com>

    <https://groups.google.com/d/__msgid/elasticsearch/9486faff-

__3eaf-4344-8931-3121bbc5d9c7%__40googlegroups.com
<https://groups.google.com/d/msgid/elasticsearch/9486faff-
3eaf-4344-8931-3121bbc5d9c7%40googlegroups.com>>
>
<https://groups.google.com/d/__msgid/elasticsearch/9486faff-
__3eaf-4344-8931-3121bbc5d9c7%__40googlegroups.com
<https://groups.google.com/d/msgid/elasticsearch/9486faff-
3eaf-4344-8931-3121bbc5d9c7%40googlegroups.com>

    <https://groups.google.com/d/__msgid/elasticsearch/9486faff-

__3eaf-4344-8931-3121bbc5d9c7%40googlegroups.com
<https://groups.google.com/d/msgid/elasticsearch/9486faff-
3eaf-4344-8931-3121bbc5d9c7%40googlegroups.com>>>>.
> > > For more options, visithttps://groups.google.

com/groups/opt_out
http://groups.google.com/groups/opt_out <
http://groups.google.com/__groups/opt_out
http://groups.google.com/groups/opt_out> <
http://groups.google.com/__groups/opt_out
http://groups.google.com/groups/opt_out

         <http://groups.google.com/__groups/opt_out <

http://groups.google.com/groups/opt_out>>>
<https://groups.google.com/__groups/opt_out <
https://groups.google.com/groups/opt_out>
<https://groups.google.com/__groups/opt_out <
https://groups.google.com/groups/opt_out>>
> <https://groups.google.com/__groups/opt_out <
https://groups.google.com/groups/opt_out>
<https://groups.google.com/__groups/opt_out <
https://groups.google.com/groups/opt_out>>>>.

         >     >
         >     >     --
         >     >     Costin
         >     >
         >     > --
         >     > You received this message because you are subscribed

to the Google Groups "elasticsearch" group.
> > To unsubscribe from this group and stop receiving
emails from it, send an email to
> >elasticsearc...@googlegroups.__com <mailto:
elasticsearc...@googlegroups.com> <javascript:>.

         >     > To view this discussion on the web visit
         >
     >https://groups.google.com/d/__msgid/elasticsearch/86187c3a-

__0974-4d10-9689-e83da788c04a%__40googlegroups.com
<https://groups.google.com/d/msgid/elasticsearch/86187c3a-
0974-4d10-9689-e83da788c04a%40googlegroups.com>

    <https://groups.google.com/d/__msgid/elasticsearch/86187c3a-

__0974-4d10-9689-e83da788c04a%__40googlegroups.com
<https://groups.google.com/d/msgid/elasticsearch/86187c3a-
0974-4d10-9689-e83da788c04a%40googlegroups.com>>
>
<https://groups.google.com/d/__msgid/elasticsearch/86187c3a-
__0974-4d10-9689-e83da788c04a%__40googlegroups.com
<https://groups.google.com/d/msgid/elasticsearch/86187c3a-
0974-4d10-9689-e83da788c04a%40googlegroups.com>

    <https://groups.google.com/d/__msgid/elasticsearch/86187c3a-

__0974-4d10-9689-e83da788c04a%40googlegroups.com
<https://groups.google.com/d/msgid/elasticsearch/86187c3a-
0974-4d10-9689-e83da788c04a%40googlegroups.com>>>.
> > For more options, visithttps://groups.google.

com/groups/opt_out
http://groups.google.com/groups/opt_out <
http://groups.google.com/__groups/opt_out
http://groups.google.com/groups/opt_out> <
https://groups.google.com/__groups/opt_out
https://groups.google.com/groups/opt_out
<https://groups.google.com/__groups/opt_out <
https://groups.google.com/groups/opt_out>>>.

         >
         >     --
         >     Costin
         >
         > --
         > You received this message because you are subscribed to

the Google Groups "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails
from it, send an email to
>elasticsearc...@googlegroups.__com <mailto:elasticsearc...@
googlegroups.com> <javascript:>.

         > To view this discussion on the web visit

     >https://groups.google.com/d/__msgid/elasticsearch/e29e342d-

__de74-4ed6-93d4-875fc728c5a5%__40googlegroups.com
<https://groups.google.com/d/msgid/elasticsearch/e29e342d-
de74-4ed6-93d4-875fc728c5a5%40googlegroups.com>

    <https://groups.google.com/d/__msgid/elasticsearch/e29e342d-

__de74-4ed6-93d4-875fc728c5a5%__40googlegroups.com
<https://groups.google.com/d/msgid/elasticsearch/e29e342d-
de74-4ed6-93d4-875fc728c5a5%40googlegroups.com>>.

         > For more options, visithttps://groups.google.__

com/groups/opt_out
http://groups.google.com/groups/opt_out <
https://groups.google.com/__groups/opt_out

    <https://groups.google.com/groups/opt_out>>.

         --
         Costin

    --
    You received this message because you are subscribed to the

Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to
elasticsearch+unsubscribe@__googlegroups.com <mailto:
elasticsearch%2Bunsubscribe@googlegroups.com>
<mailto:elasticsearch+__unsubscribe@googlegroups.com <mailto:
elasticsearch%2Bunsubscribe@googlegroups.com>>.

    To view this discussion on the web visit
    https://groups.google.com/d/__msgid/elasticsearch/c1081bf2-_

_117a-4af2-ba90-2c38a4572782%__40googlegroups.com
<https://groups.google.com/d/msgid/elasticsearch/c1081bf2-
117a-4af2-ba90-2c38a4572782%40googlegroups.com>
<https://groups.google.com/d/__msgid/elasticsearch/c1081bf2-
__117a-4af2-ba90-2c38a4572782%_40googlegroups.com?utm
medium=__email&utm_source=footer
<https://groups.google.com/d/msgid/elasticsearch/c1081bf2-
117a-4af2-ba90-2c38a4572782%40googlegroups.com?utm_medium=
email&utm_source=footer>>.

    For more options, visit https://groups.google.com/d/__optout <

https://groups.google.com/d/optout>.

--
Costin

--

You received this message because you are subscribed to a topic in

the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/__
topic/elasticsearch/S-__BrzwUHJbM/unsubscribe
<https://groups.google.com/d/topic/elasticsearch/S-
BrzwUHJbM/unsubscribe>.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@__googlegroups.com
mailto:elasticsearch%2Bunsubscribe@googlegroups.com.

To view this discussion on the web visit
https://groups.google.com/d/__msgid/elasticsearch/532AE2B5._

_8080004%40gmail.com
<https://groups.google.com/d/msgid/elasticsearch/532AE2B5.
8080004%40gmail.com>.

For more options, visit https://groups.google.com/d/__optout <

https://groups.google.com/d/optout>.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to
elasticsearch+unsubscribe@googlegroups.com <mailto:elasticsearch+
unsubscribe@googlegroups.com>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CALD%
2B6GNJD0wMJPzXwQqvfL4%2B0nZmw4XzFrPdEc%2BOPLZVeNuZpw%40mail.gmail.com
<https://groups.google.com/d/msgid/elasticsearch/CALD%
2B6GNJD0wMJPzXwQqvfL4%2B0nZmw4XzFrPdEc%2BOPLZVeNuZpw%40mail.gmail.
com?utm_medium=email&utm_source=footer>.

For more options, visit https://groups.google.com/d/optout.

--
Costin

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/
topic/elasticsearch/S-BrzwUHJbM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/53343AA6.1000405%40gmail.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALD%2B6GMsgCB2Yqs2LLsbGinXSBOhB4ULVX1eaMm0vTvGpgLY7A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Costin Leau) #12

Hi Nick,

I'm glad to see you are making progress. This week I'm mainly on the road but maybe we can meet on the IRC next week, my
invitation still stands :slight_smile:
Timestamp is relatively new type and doesn't handle timezones properly - it is backed by java.sq.Timestamp so it
inherits a lot of its issues.
For some reason the year in your date is rather off so it's worth checking the data read by es-hadoop before passing it
to Hive (see [1]).
I've had issues myself with it and it the moment the cluster is in a different timezone than the dataset itself things
get buggy.
Try using a UDF to do the conversion from the long to a timestamp - I've tried doing something similar in our conversion
but since we don't know the timezones
used, it's easy for things to get mixed.

Cheers,

[1] http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/troubleshooting.html

On 5/13/14 8:25 PM, Nick Pentreath wrote:

Hi Costin

Sorry for the silence on this issue. This went a bit quiet.

But the good news is I've come back to it and managed to get it all working with the new shark 0.9.1 release and
2.0.0RC1. Actually if I used ADD JAR I got the same exception but when I just put the JAR into the shark lib/ folder it
worked fine (which seems to point to the classpath issue you mention).

However, I seem to have an issue with date <-> timestamp conversion.

I have a field in ES called "_ts" that has type "date" and the default format "dateOptionalTime". When I do a query that
includes the timestamp it comes back NULL:

select ts from table ...
(note I use a correct es.mapping.names to map the _ts field in ES to ts field in Hive/Shark that has timestamp type).

below is some of the debug-level output:

14/05/13 19:19:47 DEBUG lazy.LazyPrimitive: Data not in the TIMESTAMP data type range so converted to null. Given data
is :96997506-06-30 19:08:168:16.768
14/05/13 19:19:47 DEBUG lazy.LazyPrimitive: Data not in the TIMESTAMP data type range so converted to null. Given data
is :96997605-06-28 19:08:168:16.768
14/05/13 19:19:47 DEBUG lazy.LazyPrimitive: Data not in the TIMESTAMP data type range so converted to null. Given data
is :96997624-06-28 19:08:168:16.768
14/05/13 19:19:47 DEBUG lazy.LazyPrimitive: Data not in the TIMESTAMP data type range so converted to null. Given data
is :96997629-06-28 19:08:168:16.768
14/05/13 19:19:47 DEBUG lazy.LazyPrimitive: Data not in the TIMESTAMP data type range so converted to null. Given data
is :96997634-06-29 19:08:168:16.768
NULL
NULL
NULL
NULL
NULL

The data that I index in the _ts field is timestamp in ms (long). It doesn't seem to be converted correctly but the data
is correct (in ms at least) and I can query against it using date formats and date math in ES.

Example snippet from debug log from above:
,"_ts":1397130475607}}]}}"

Any ideas or am I doing something silly?

I do see that the Hive timestamp expects either seconds since epoch of a string-based format that has nanosecond
granularity. Is this the issue with just ms long timestamp data?

Thanks
Nick

On Thu, Mar 27, 2014 at 4:50 PM, Costin Leau <costin.leau@gmail.com mailto:costin.leau@gmail.com> wrote:

Using the latest hive and hadoop is preferred as they contain various bug fixes.
The error suggests a classpath issue - namely the same class is loaded twice for some reason and hence the casting
fails.

Let's connect on IRC - give me a ping when you're available (user is costin).

Cheers,


On 3/27/14 4:29 PM, Nick Pentreath wrote:

    Thanks for the response.

    I tried latest Shark (cdh4 version of 0.9.1 here http://cloudera.rst.im/shark/ ) - this uses hadoop 1.0.4 and
    hive 0.11
    I believe, and build elasticsearch-hadoop from github master.

    Still getting same error:
    org.elasticsearch.hadoop.hive.__EsHiveInputFormat$EsHiveSplit cannot be cast to
    org.elasticsearch.hadoop.hive.__EsHiveInputFormat$EsHiveSplit

    Will using hive 0.11 / hadoop 1.0.4 vs hive 0.12 / hadoop 1.2.1 in es-hadoop master make a difference?


    Anyone else actually got this working?



    On Thu, Mar 20, 2014 at 2:44 PM, Costin Leau <costin.leau@gmail.com <mailto:costin.leau@gmail.com>
    <mailto:costin.leau@gmail.com <mailto:costin.leau@gmail.com>>__> wrote:

         I recommend using master - there are several improvements done in this area. Also using the latest Shark
    (0.9.0) and
         Hive (0.12) will help.


         On 3/20/14 12:00 PM, Nick Pentreath wrote:

             Hi

             I am struggling to get this working too. I'm just trying locally for now, running Shark 0.8.1, Hive
    0.9.0 and ES
             1.0.1
             with ES-hadoop 1.3.0.M2.

             I managed to get a basic example working with WRITING into an index. But I'm really after READING and
    index.

             I believe I have set everything up correctly, I've added the jar to Shark:
             ADD JAR /path/to/es-hadoop.jar;

             created a table:
             CREATE EXTERNAL TABLE test_read (name string, price double)

             STORED BY 'org.elasticsearch.hadoop.____hive.EsStorageHandler'

             TBLPROPERTIES('es.resource' = 'test_index/test_type/_search?____q=*');



             And then trying to 'SELECT * FROM test _read' gives me :

             org.apache.spark.____SparkException: Job aborted: Task 3.0:0 failed more than 0 times; aborting job
             java.lang.ClassCastException: org.elasticsearch.hadoop.hive.____EsHiveInputFormat$__ESHiveSplit cannot
    be cast to
             org.elasticsearch.hadoop.hive.____EsHiveInputFormat$__ESHiveSplit

             at org.apache.spark.scheduler.____DAGScheduler$$anonfun$____abortStage$1.apply(____DAGScheduler.scala:827)

             at org.apache.spark.scheduler.____DAGScheduler$$anonfun$____abortStage$1.apply(____DAGScheduler.scala:825)

             at scala.collection.mutable.____ResizableArray$class.foreach(____ResizableArray.scala:60)

             at scala.collection.mutable.____ArrayBuffer.foreach(____ArrayBuffer.scala:47)

             at org.apache.spark.scheduler.____DAGScheduler.abortStage(____DAGScheduler.scala:825)

             at org.apache.spark.scheduler.____DAGScheduler.processEvent(____DAGScheduler.scala:440)

             at org.apache.spark.scheduler.____DAGScheduler.org
             <http://org.apache.spark.__scheduler.DAGScheduler.org
    <http://org.apache.spark.scheduler.DAGScheduler.org>>$__apache$spark$__scheduler$__DAGScheduler$$run(____DAGScheduler.scala:502)

             at org.apache.spark.scheduler.____DAGScheduler$$anon$1.run(____DAGScheduler.scala:157)


             FAILED: Execution Error, return code -101 from shark.execution.SparkTask


             In fact I get the same error thrown when trying to READ from the table that I successfully WROTE to...

             On Saturday, 22 February 2014 12:31:21 UTC+2, Costin Leau wrote:

                  Yeah, it might have been some sort of network configuration issue where services where running on
    different
             machines
                  and
                  localhost pointed to a different location.

                  Either way, I'm glad to hear things have are moving forward.

                  Cheers,

                  On 22/02/2014 1:06 AM, Max Lang wrote:
                  > I managed to get it working on ec2 without issue this time. I'd say the biggest difference was
    that this
             time I set up a
                  > dedicated ES machine. Is it possible that, because I was using a cluster with slaves, when I used
             "localhost" the slaves
                  > couldn't find the ES instance running on the master? Or do all the requests go through the master?
                  >
                  >
                  > On Wednesday, February 19, 2014 2:35:40 PM UTC-8, Costin Leau wrote:
                  >
                  >     Hi,
                  >
                  >     Setting logging in Hive/Hadoop can be tricky since the log4j needs to be picked up by the
    running JVM
             otherwise you
                  >     won't see anything.
                  >     Take a look at this link on how to tell Hive to use your logging settings [1].
                  >
                  >     For the next release, we might introduce dedicated exceptions for the simple fact that some
             libraries, like Hive,
                  >     swallow the stack trace and it's unclear what the issue is which makes the exception
             (IllegalStateException) ambiguous.
                  >
                  >     Let me know how it goes and whether you will encounter any issues with Shark. Or if you don't :)
                  >
                  >     Thanks!
                  >
                  >
    [1]https://cwiki.apache.org/____confluence/display/Hive/____GettingStarted#GettingStarted-____ErrorLogs
    <https://cwiki.apache.org/__confluence/display/Hive/__GettingStarted#GettingStarted-__ErrorLogs>
             <https://cwiki.apache.org/__confluence/display/Hive/__GettingStarted#GettingStarted-__ErrorLogs
    <https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-ErrorLogs>>

      <https://cwiki.apache.org/____confluence/display/Hive/____GettingStarted#GettingStarted-____ErrorLogs
    <https://cwiki.apache.org/__confluence/display/Hive/__GettingStarted#GettingStarted-__ErrorLogs>
             <https://cwiki.apache.org/__confluence/display/Hive/__GettingStarted#GettingStarted-__ErrorLogs
    <https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-ErrorLogs>>>
                  >
    <https://cwiki.apache.org/____confluence/display/Hive/____GettingStarted#GettingStarted-____ErrorLogs
    <https://cwiki.apache.org/__confluence/display/Hive/__GettingStarted#GettingStarted-__ErrorLogs>
             <https://cwiki.apache.org/__confluence/display/Hive/__GettingStarted#GettingStarted-__ErrorLogs
    <https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-ErrorLogs>>

      <https://cwiki.apache.org/____confluence/display/Hive/____GettingStarted#GettingStarted-____ErrorLogs
    <https://cwiki.apache.org/__confluence/display/Hive/__GettingStarted#GettingStarted-__ErrorLogs>

             <https://cwiki.apache.org/__confluence/display/Hive/__GettingStarted#GettingStarted-__ErrorLogs
    <https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-ErrorLogs>>>>
                  >
                  >     On 20/02/2014 12:02 AM, Max Lang wrote:
                  >     > Hey Costin,
                  >     >
                  >     > Thanks for the swift reply. I abandoned EC2 to take that out of the equation and managed
    to get
             everything working
                  >     > locally using the latest version of everything (though I realized just now I'm still on
    hive 0.9).
             I'm guessing you're
                  >     > right about some port connection issue because I definitely had ES running on that machine.
                  >     >
                  >     > I changed hive-log4j.properties and added
                  >     > |
                  >     > #custom logging levels
                  >     > #log4j.logger.xxx=DEBUG
                  >     > log4j.logger.org <http://log4j.logger.org>
    <http://log4j.logger.org>.____elasticsearch.hadoop.rest=____TRACE
                  >     >log4j.logger.org.__elasticsea__rch.hadoop.mr <http://elasticsearch.hadoop.mr>
    <http://log4j.logger.org.__elasticsearch.hadoop.mr <http://log4j.logger.org.elasticsearch.hadoop.mr>>
             <http://log4j.logger.org.__ela__sticsearch.hadoop.mr <http://elasticsearch.hadoop.mr>
    <http://log4j.logger.org.__elasticsearch.hadoop.mr <http://log4j.logger.org.elasticsearch.hadoop.mr>>>
                  <http://log4j.logger.org.__ela__sticsearch.hadoop.mr <http://elasticsearch.hadoop.mr>
    <http://log4j.logger.org.__elasticsearch.hadoop.mr <http://log4j.logger.org.elasticsearch.hadoop.mr>>
             <http://log4j.logger.org.__ela__sticsearch.hadoop.mr <http://elasticsearch.hadoop.mr>
    <http://log4j.logger.org.__elasticsearch.hadoop.mr <http://log4j.logger.org.elasticsearch.hadoop.mr>>>>=____TRACE


                  >     > |
                  >     >
                  >     > But I didn't see any trace logging. Hopefully I can get it working on EC2 without issue,
    but, for
             the future, is this
                  >     > the correct way to set TRACE logging?
                  >     >
                  >     > Oh and, for reference, I tried running without ES up and I got the following, exceptions:
                  >     >
                  >     > 2014-02-19 13:46:08,803 ERROR shark.SharkDriver (Logging.scala:logError(64)) - FAILED: Hive
             Internal Error:
                  >     > java.lang.____IllegalStateException(Cannot discover Elasticsearch version)
                  >     > java.lang.____IllegalStateException: Cannot discover Elasticsearch version
                  >     > at org.elasticsearch.hadoop.hive.____EsStorageHandler.init(____EsStorageHandler.java:101)
                  >     > at

    org.elasticsearch.hadoop.hive.____EsStorageHandler.____configureOutputJobProperties(____EsStorageHandler.java:83)
                  >     > at

    org.apache.hadoop.hive.ql.____plan.PlanUtils.____configureJobPropertiesForStora____geHandler(PlanUtils.java:__706)
                  >     > at

    org.apache.hadoop.hive.ql.____plan.PlanUtils.____configureOutputJobPropertiesFo____rStorageHandler(PlanUtils.____java:675)
                  >     > at
    org.apache.hadoop.hive.ql.____exec.FileSinkOperator.____augmentPlan(FileSinkOperator.____java:764)
                  >     > at
    org.apache.hadoop.hive.ql.____parse.SemanticAnalyzer.____putOpInsertMap(____SemanticAnalyzer.java:1518)
                  >     > at
    org.apache.hadoop.hive.ql.____parse.SemanticAnalyzer.____genFileSinkPlan(____SemanticAnalyzer.java:4337)
                  >     > at

    org.apache.hadoop.hive.ql.____parse.SemanticAnalyzer.____genPostGroupByBodyPlan(____SemanticAnalyzer.java:6207)
                  >     > at
    org.apache.hadoop.hive.ql.____parse.SemanticAnalyzer.____genBodyPlan(SemanticAnalyzer.____java:6138)
                  >     > at
    org.apache.hadoop.hive.ql.____parse.SemanticAnalyzer.____genPlan(SemanticAnalyzer.java:____6764)
                  >     > at
    shark.parse.____SharkSemanticAnalyzer.____analyzeInternal(____SharkSemanticAnalyzer.scala:____149)
                  >     > at
    org.apache.hadoop.hive.ql.____parse.BaseSemanticAnalyzer.____analyze(BaseSemanticAnalyzer.____java:244)
                  >     > at shark.SharkDriver.compile(____SharkDriver.scala:215)
                  >     > at org.apache.hadoop.hive.ql.____Driver.compile(Driver.java:____336)
                  >     > at org.apache.hadoop.hive.ql.____Driver.run(Driver.java:895)
                  >     > at shark.SharkCliDriver.____processCmd(SharkCliDriver.____scala:324)
                  >     > at org.apache.hadoop.hive.cli.____CliDriver.processLine(____CliDriver.java:406)
                  >     > at shark.SharkCliDriver$.main(____SharkCliDriver.scala:232)
                  >     > at shark.SharkCliDriver.main(____SharkCliDriver.scala)

                  >     > Caused by: java.io.IOException: Out of nodes and retries; caught exception
                  >     > at org.elasticsearch.hadoop.rest.____NetworkClient.execute(____NetworkClient.java:81)
                  >     > at org.elasticsearch.hadoop.rest.____RestClient.execute(__RestClient.__java:221)
                  >     > at org.elasticsearch.hadoop.rest.____RestClient.execute(__RestClient.__java:205)
                  >     > at org.elasticsearch.hadoop.rest.____RestClient.execute(__RestClient.__java:209)
                  >     > at org.elasticsearch.hadoop.rest.____RestClient.get(RestClient.____java:103)
                  >     > at org.elasticsearch.hadoop.rest.____RestClient.esVersion(____RestClient.java:274)
                  >     > at

    org.elasticsearch.hadoop.rest.____InitializationUtils.____discoverEsVersion(____InitializationUtils.java:84)
                  >     > at org.elasticsearch.hadoop.hive.____EsStorageHandler.init(____EsStorageHandler.java:99)

                  >     > ... 18 more
                  >     > Caused by: java.net.ConnectException: Connection refused
                  >     > at java.net.PlainSocketImpl.____socketConnect(Native Method)
                  >     > at java.net <http://java.net>
             <http://java.net>.____AbstractPlainSocketImpl.____doConnect(____AbstractPlainSocketImpl.java:____339)
                  >     > at java.net <http://java.net>

    <http://java.net>.____AbstractPlainSocketImpl.____connectToAddress(____AbstractPlainSocketImpl.java:____200)
                  >     > at java.net <http://java.net>
    <http://java.net>.____AbstractPlainSocketImpl.____connect(____AbstractPlainSocketImpl.java:____182)
                  >     > at java.net.SocksSocketImpl.____connect(SocksSocketImpl.java:____391)
                  >     > at java.net.Socket.connect(____Socket.java:579)
                  >     > at java.net.Socket.connect(____Socket.java:528)
                  >     > at java.net.Socket.<init>(Socket.____java:425)
                  >     > at java.net.Socket.<init>(Socket.____java:280)
                  >     > at

    org.apache.commons.httpclient.____protocol.____DefaultProtocolSocketFactory.____createSocket(____DefaultProtocolSocketFactory.____java:80)
                  >     > at

    org.apache.commons.httpclient.____protocol.____DefaultProtocolSocketFactory.____createSocket(____DefaultProtocolSocketFactory.____java:122)
                  >     > at org.apache.commons.httpclient.____HttpConnection.open(____HttpConnection.java:707)
                  >     > at
    org.apache.commons.httpclient.____HttpMethodDirector.____executeWithRetry(____HttpMethodDirector.java:387)
                  >     > at
    org.apache.commons.httpclient.____HttpMethodDirector.____executeMethod(____HttpMethodDirector.java:171)
                  >     > at org.apache.commons.httpclient.____HttpClient.executeMethod(____HttpClient.java:397)
                  >     > at org.apache.commons.httpclient.____HttpClient.executeMethod(____HttpClient.java:323)
                  >     > at

    org.elasticsearch.hadoop.rest.____commonshttp.____CommonsHttpTransport.execute(____CommonsHttpTransport.java:__160)
                  >     > at org.elasticsearch.hadoop.rest.____NetworkClient.execute(____NetworkClient.java:74)

                  >     > ... 25 more
                  >     >
                  >     > Let me know if there's anything in particular you'd like me to try on EC2.
                  >     >
                  >     > (For posterity, the versions I used were: hadoop 2.2.0, hive 0.9.0, shark 8.1, spark 8.1,
    es-hadoop
             1.3.0.M2, java
                  >     > 1.7.0_15, scala 2.9.3, elasticsearch 1.0.0)
                  >     >
                  >     > Thanks again,
                  >     > Max
                  >     >
                  >     > On Tuesday, February 18, 2014 10:16:38 PM UTC-8, Costin Leau wrote:
                  >     >
                  >     >     The error indicates a network error - namely es-hadoop cannot connect to Elasticsearch
    on the
             default (localhost:9200)
                  >     >     HTTP port. Can you double check whether that's indeed the case (using curl or even
    telnet on
             that port) - maybe the
                  >     >     firewall prevents any connections to be made...
                  >     >     Also you could try using the latest Hive, 0.12 and a more recent Hadoop such as 1.1.2
    or 1.2.1.
                  >     >
                  >     >     Additionally, can you enable TRACE logging in your job on es-hadoop packages
             org.elasticsearch.hadoop.rest and
                  >     >org.elasticsearch.hadoop.mr <http://org.elasticsearch.hadoop.mr>
    <http://org.elasticsearch.__hadoop.mr <http://org.elasticsearch.hadoop.mr>>
             <http://org.elasticsearch.__ha__doop.mr <http://hadoop.mr> <http://org.elasticsearch.__hadoop.mr
    <http://org.elasticsearch.hadoop.mr>>>
             <http://org.elasticsearch.__ha__doop.mr <http://hadoop.mr> <http://org.elasticsearch.__hadoop.mr
    <http://org.elasticsearch.hadoop.mr>>
                  <http://org.elasticsearch.__ha__doop.mr <http://hadoop.mr> <http://org.elasticsearch.__hadoop.mr
    <http://org.elasticsearch.hadoop.mr>>>>
             <http://org.elasticsearch.__ha__doop.mr <http://hadoop.mr> <http://org.elasticsearch.__hadoop.mr
    <http://org.elasticsearch.hadoop.mr>> <http://org.elasticsearch.__ha__doop.mr <http://hadoop.mr>
             <http://org.elasticsearch.__hadoop.mr <http://org.elasticsearch.hadoop.mr>>>

                  >     <http://org.elasticsearch.__ha__doop.mr <http://hadoop.mr>
    <http://org.elasticsearch.__hadoop.mr <http://org.elasticsearch.hadoop.mr>>
             <http://org.elasticsearch.__ha__doop.mr <http://hadoop.mr> <http://org.elasticsearch.__hadoop.mr
    <http://org.elasticsearch.hadoop.mr>>>>> packages and report back ?

                  >     >
                  >     >     Thanks,
                  >     >
                  >     >     On 19/02/2014 4:03 AM, Max Lang wrote:
                  >     >     > I set everything up using this
             guide:https://github.com/____amplab/shark/wiki/Running-____Shark-on-EC2
    <https://github.com/__amplab/shark/wiki/Running-__Shark-on-EC2>
             <https://github.com/amplab/__shark/wiki/Running-Shark-on-__EC2
    <https://github.com/amplab/shark/wiki/Running-Shark-on-EC2>>
                  <https://github.com/amplab/____shark/wiki/Running-Shark-on-____EC2
    <https://github.com/amplab/__shark/wiki/Running-Shark-on-__EC2>
             <https://github.com/amplab/__shark/wiki/Running-Shark-on-__EC2
    <https://github.com/amplab/shark/wiki/Running-Shark-on-EC2>>>
                  <https://github.com/amplab/____shark/wiki/Running-Shark-on-____EC2
    <https://github.com/amplab/__shark/wiki/Running-Shark-on-__EC2>
             <https://github.com/amplab/__shark/wiki/Running-Shark-on-__EC2
    <https://github.com/amplab/shark/wiki/Running-Shark-on-EC2>>
             <https://github.com/amplab/____shark/wiki/Running-Shark-on-____EC2
    <https://github.com/amplab/__shark/wiki/Running-Shark-on-__EC2>
             <https://github.com/amplab/__shark/wiki/Running-Shark-on-__EC2
    <https://github.com/amplab/shark/wiki/Running-Shark-on-EC2>>>>
                  >     >     <https://github.com/amplab/____shark/wiki/Running-Shark-on-____EC2
    <https://github.com/amplab/__shark/wiki/Running-Shark-on-__EC2>
             <https://github.com/amplab/__shark/wiki/Running-Shark-on-__EC2
    <https://github.com/amplab/shark/wiki/Running-Shark-on-EC2>>
             <https://github.com/amplab/____shark/wiki/Running-Shark-on-____EC2
    <https://github.com/amplab/__shark/wiki/Running-Shark-on-__EC2>
             <https://github.com/amplab/__shark/wiki/Running-Shark-on-__EC2
    <https://github.com/amplab/shark/wiki/Running-Shark-on-EC2>>>
                  >     <https://github.com/amplab/____shark/wiki/Running-Shark-on-____EC2
    <https://github.com/amplab/__shark/wiki/Running-Shark-on-__EC2>
             <https://github.com/amplab/__shark/wiki/Running-Shark-on-__EC2
    <https://github.com/amplab/shark/wiki/Running-Shark-on-EC2>>
                  <https://github.com/amplab/____shark/wiki/Running-Shark-on-____EC2
    <https://github.com/amplab/__shark/wiki/Running-Shark-on-__EC2>

             <https://github.com/amplab/__shark/wiki/Running-Shark-on-__EC2
    <https://github.com/amplab/shark/wiki/Running-Shark-on-EC2>>>>> on an ec2 cluster. I've
                  >     >     > copied the elasticsearch-hadoop jars into the hive lib directory and I have
    elasticsearch
             running on localhost:9200. I'm
                  >     >     > running shark in a screen session with --service screenserver and connecting to it
    at the
             same time using shark -h
                  >     >     > localhost.
                  >     >     >
                  >     >     > Unfortunately, when I attempt to write data into elasticsearch, it fails. Here's an
    example:
                  >     >     >
                  >     >     > |
                  >     >     > [localhost:10000]shark>CREATE EXTERNAL TABLE wiki (id BIGINT,title STRING,last_modified
             STRING,xml STRING,text
                  >     >     > STRING)ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'LOCATION
             's3n://spark-data/wikipedia-____sample/';

                  >     >     > Timetaken (including network latency):0.159seconds
                  >     >     > 14/02/1901:23:33INFO CliDriver:Timetaken (including network latency):0.159seconds
                  >     >     >
                  >     >     > [localhost:10000]shark>SELECT title FROM wiki LIMIT 1;
                  >     >     > Alpokalja
                  >     >     > Timetaken (including network latency):2.23seconds
                  >     >     > 14/02/1901:23:48INFO CliDriver:Timetaken (including network latency):2.23seconds
                  >     >     >
                  >     >     > [localhost:10000]shark>CREATE EXTERNAL TABLE es_wiki (id BIGINT,title
    STRING,last_modified
             STRING,xml STRING,text
                  >     >     > STRING)STORED BY

    'org.elasticsearch.hadoop.____hive.EsStorageHandler'____TBLPROPERTIES('es.resource'='____wikipedia/article');

                  >     >     > Timetaken (including network latency):0.061seconds
                  >     >     > 14/02/1901:33:51INFO CliDriver:Timetaken (including network latency):0.061seconds
                  >     >     >
                  >     >     > [localhost:10000]shark>INSERT OVERWRITE TABLE es_wiki SELECTw.id
             <http://w.id>,w.title,w.last_____modified,w.xml,w.text FROM wiki w;
                  >     >     > [HiveError]:Queryreturned non-zero code:9,cause:FAILED:____ExecutionError,returncode
             -101fromshark.execution.____SparkTask

                  >     >     > Timetaken (including network latency):3.575seconds
                  >     >     > 14/02/1901:34:42INFO CliDriver:Timetaken (including network latency):3.575seconds
                  >     >     > |
                  >     >     >
                  >     >     > *The stack trace looks like this:*
                  >     >     >
                  >     >     > org.apache.hadoop.hive.ql.____metadata.HiveException
             (org.apache.hadoop.hive.ql.____metadata.HiveException: java.io.IOException:

                  >     >     > Out of nodes and retries; caught exception)
                  >     >     >
                  >     >     >

    org.apache.hadoop.hive.ql.____exec.FileSinkOperator.____processOp(FileSinkOperator.____java:602)shark.execution.____FileSinkOperator$$anonfun$____processPartition$1.apply(____FileSinkOperator.scala:84)____shark.execution.____FileSinkOperator$$anonfun$____processPartition$1.apply(____FileSinkOperator.scala:81)____scala.collection.Iterator$____class.foreach(Iterator.scala:____772)scala.collection.__Iterator$__$anon$19.foreach(__Iterator.__scala:399)shark.__execution.__FileSinkOperator.____processPartition(____FileSinkOperator.scala:81)____shark.execution.____FileSinkOperator$.writeFiles$____1(FileSinkOperator.scala:207)____shark.execution.____FileSinkOperator$$anonfun$____executeProcessFileSinkPartitio____n$1.apply(FileSinkOperator.____scala:211)shark.execution.____FileSinkOperator$$anonfun$____executeProcessFileSinkPartitio____n$1.apply(FileSinkOperator.____scala:211)org.apache.spark.____scheduler.ResultTask.runTask(____ResultTask.scala:107)org.____apache.spark.scheduler.Ta

sk.____run(Task.scala:53)org.apache.____spark.executor.Executor$____Task

Runner$$anonfun$run$1.__apply$__mcV$sp(Executor.scala:__215)__org.apac


         he.spa


                  rk.dep
                  >
                  >     loy.Sp
                  >     >
                  >     >

    arkHadoopUtil.runAsUser(____SparkHadoopUtil.scala:50)org.____apache.spark.executor.____Executor$TaskRunner.run(____Executor.scala:182)java.util.____concurrent.__ThreadPoolExecutor.____runWorker(ThreadPoolExecutor.____java:1145)java.util.____concurrent.ThreadPoolExecutor$____Worker.run(__ThreadPoolExecutor.__java:615)__java.lang.Thread.run(__Thread.__java:744


                  >
                  >     >
                  >     >     > I should be using Hive 0.9.0, shark 0.8.1, elasticsearch 1.0.0, Hadoop 1.0.4, and
    java 1.7.0_51
                  >     >     > Based on my cursory look at the hadoop and elasticsearch-hadoop sources, it looks
    like hive
             is just rethrowing an
                  >     >     > IOException it's getting from Spark, and elasticsearch-hadoop is just hitting those
    exceptions.
                  >     >     > I suppose my questions are: Does this look like an issue with my ES/elasticsearch-hadoop
             config? And has anyone gotten
                  >     >     > elasticsearch working with Spark/Shark?
                  >     >     > Any ideas/insights are appreciated.
                  >     >     > Thanks,Max
                  >     >     >
                  >     >     > --
                  >     >     > You received this message because you are subscribed to the Google Groups
    "elasticsearch" group.
                  >     >     > To unsubscribe from this group and stop receiving emails from it, send an email to
                  >     >     >elasticsearc...@googlegroups.____com <mailto:elasticsearc...@__googlegroups.com
    <mailto:elasticsearc...@googlegroups.com>> <javascript:>.

                  >     >     > To view this discussion on the web visit
                  >     >

      >https://groups.google.com/d/____msgid/elasticsearch/9486faff-____3eaf-4344-8931-3121bbc5d9c7%____40googlegroups.com <https://groups.google.com/d/__msgid/elasticsearch/9486faff-__3eaf-4344-8931-3121bbc5d9c7%__40googlegroups.com>

    <https://groups.google.com/d/__msgid/elasticsearch/9486faff-__3eaf-4344-8931-3121bbc5d9c7%__40googlegroups.com
    <https://groups.google.com/d/msgid/elasticsearch/9486faff-3eaf-4344-8931-3121bbc5d9c7%40googlegroups.com>>


    <https://groups.google.com/d/____msgid/elasticsearch/9486faff-____3eaf-4344-8931-3121bbc5d9c7%____40googlegroups.com
    <https://groups.google.com/d/__msgid/elasticsearch/9486faff-__3eaf-4344-8931-3121bbc5d9c7%__40googlegroups.com>

    <https://groups.google.com/d/__msgid/elasticsearch/9486faff-__3eaf-4344-8931-3121bbc5d9c7%__40googlegroups.com
    <https://groups.google.com/d/msgid/elasticsearch/9486faff-3eaf-4344-8931-3121bbc5d9c7%40googlegroups.com>>>
                  >

    <https://groups.google.com/d/____msgid/elasticsearch/9486faff-____3eaf-4344-8931-3121bbc5d9c7%____40googlegroups.com
    <https://groups.google.com/d/__msgid/elasticsearch/9486faff-__3eaf-4344-8931-3121bbc5d9c7%__40googlegroups.com>

    <https://groups.google.com/d/__msgid/elasticsearch/9486faff-__3eaf-4344-8931-3121bbc5d9c7%__40googlegroups.com
    <https://groups.google.com/d/msgid/elasticsearch/9486faff-3eaf-4344-8931-3121bbc5d9c7%40googlegroups.com>>


    <https://groups.google.com/d/____msgid/elasticsearch/9486faff-____3eaf-4344-8931-3121bbc5d9c7%____40googlegroups.com
    <https://groups.google.com/d/__msgid/elasticsearch/9486faff-__3eaf-4344-8931-3121bbc5d9c7%__40googlegroups.com>

    <https://groups.google.com/d/__msgid/elasticsearch/9486faff-__3eaf-4344-8931-3121bbc5d9c7%__40googlegroups.com
    <https://groups.google.com/d/msgid/elasticsearch/9486faff-3eaf-4344-8931-3121bbc5d9c7%40googlegroups.com>>>>
                  >     >

    <https://groups.google.com/d/____msgid/elasticsearch/9486faff-____3eaf-4344-8931-3121bbc5d9c7%____40googlegroups.com
    <https://groups.google.com/d/__msgid/elasticsearch/9486faff-__3eaf-4344-8931-3121bbc5d9c7%__40googlegroups.com>

    <https://groups.google.com/d/__msgid/elasticsearch/9486faff-__3eaf-4344-8931-3121bbc5d9c7%__40googlegroups.com
    <https://groups.google.com/d/msgid/elasticsearch/9486faff-3eaf-4344-8931-3121bbc5d9c7%40googlegroups.com>>


    <https://groups.google.com/d/____msgid/elasticsearch/9486faff-____3eaf-4344-8931-3121bbc5d9c7%____40googlegroups.com
    <https://groups.google.com/d/__msgid/elasticsearch/9486faff-__3eaf-4344-8931-3121bbc5d9c7%__40googlegroups.com>

    <https://groups.google.com/d/__msgid/elasticsearch/9486faff-__3eaf-4344-8931-3121bbc5d9c7%__40googlegroups.com
    <https://groups.google.com/d/msgid/elasticsearch/9486faff-3eaf-4344-8931-3121bbc5d9c7%40googlegroups.com>>>
                  >

    <https://groups.google.com/d/____msgid/elasticsearch/9486faff-____3eaf-4344-8931-3121bbc5d9c7%____40googlegroups.com
    <https://groups.google.com/d/__msgid/elasticsearch/9486faff-__3eaf-4344-8931-3121bbc5d9c7%__40googlegroups.com>

    <https://groups.google.com/d/__msgid/elasticsearch/9486faff-__3eaf-4344-8931-3121bbc5d9c7%__40googlegroups.com
    <https://groups.google.com/d/msgid/elasticsearch/9486faff-3eaf-4344-8931-3121bbc5d9c7%40googlegroups.com>>


    <https://groups.google.com/d/____msgid/elasticsearch/9486faff-____3eaf-4344-8931-3121bbc5d9c7%____40googlegroups.com
    <https://groups.google.com/d/__msgid/elasticsearch/9486faff-__3eaf-4344-8931-3121bbc5d9c7%__40googlegroups.com>

    <https://groups.google.com/d/__msgid/elasticsearch/9486faff-__3eaf-4344-8931-3121bbc5d9c7%__40googlegroups.com
    <https://groups.google.com/d/msgid/elasticsearch/9486faff-3eaf-4344-8931-3121bbc5d9c7%40googlegroups.com>>>>>.
                  >     >     > For more options, visithttps://groups.google.____com/groups/opt_out
             <http://groups.google.com/__groups/opt_out <http://groups.google.com/groups/opt_out>>
    <http://groups.google.com/____groups/opt_out <http://groups.google.com/__groups/opt_out>
             <http://groups.google.com/__groups/opt_out <http://groups.google.com/groups/opt_out>>>
    <http://groups.google.com/____groups/opt_out <http://groups.google.com/__groups/opt_out>
             <http://groups.google.com/__groups/opt_out <http://groups.google.com/groups/opt_out>>

                  <http://groups.google.com/____groups/opt_out <http://groups.google.com/__groups/opt_out>
    <http://groups.google.com/__groups/opt_out <http://groups.google.com/groups/opt_out>>>>
             <https://groups.google.com/____groups/opt_out <https://groups.google.com/__groups/opt_out>
    <https://groups.google.com/__groups/opt_out <https://groups.google.com/groups/opt_out>>
                  <https://groups.google.com/____groups/opt_out <https://groups.google.com/__groups/opt_out>
    <https://groups.google.com/__groups/opt_out <https://groups.google.com/groups/opt_out>>>
                  >     <https://groups.google.com/____groups/opt_out <https://groups.google.com/__groups/opt_out>
    <https://groups.google.com/__groups/opt_out <https://groups.google.com/groups/opt_out>>
             <https://groups.google.com/____groups/opt_out <https://groups.google.com/__groups/opt_out>
    <https://groups.google.com/__groups/opt_out <https://groups.google.com/groups/opt_out>>>>>.

                  >     >
                  >     >     --
                  >     >     Costin
                  >     >
                  >     > --
                  >     > You received this message because you are subscribed to the Google Groups "elasticsearch"
    group.
                  >     > To unsubscribe from this group and stop receiving emails from it, send an email to
                  >     >elasticsearc...@googlegroups.____com <mailto:elasticsearc...@__googlegroups.com
    <mailto:elasticsearc...@googlegroups.com>> <javascript:>.

                  >     > To view this discussion on the web visit
                  >

      >https://groups.google.com/d/____msgid/elasticsearch/86187c3a-____0974-4d10-9689-e83da788c04a%____40googlegroups.com <https://groups.google.com/d/__msgid/elasticsearch/86187c3a-__0974-4d10-9689-e83da788c04a%__40googlegroups.com>

    <https://groups.google.com/d/__msgid/elasticsearch/86187c3a-__0974-4d10-9689-e83da788c04a%__40googlegroups.com
    <https://groups.google.com/d/msgid/elasticsearch/86187c3a-0974-4d10-9689-e83da788c04a%40googlegroups.com>>


    <https://groups.google.com/d/____msgid/elasticsearch/86187c3a-____0974-4d10-9689-e83da788c04a%____40googlegroups.com
    <https://groups.google.com/d/__msgid/elasticsearch/86187c3a-__0974-4d10-9689-e83da788c04a%__40googlegroups.com>

    <https://groups.google.com/d/__msgid/elasticsearch/86187c3a-__0974-4d10-9689-e83da788c04a%__40googlegroups.com
    <https://groups.google.com/d/msgid/elasticsearch/86187c3a-0974-4d10-9689-e83da788c04a%40googlegroups.com>>>
                  >

    <https://groups.google.com/d/____msgid/elasticsearch/86187c3a-____0974-4d10-9689-e83da788c04a%____40googlegroups.com
    <https://groups.google.com/d/__msgid/elasticsearch/86187c3a-__0974-4d10-9689-e83da788c04a%__40googlegroups.com>

    <https://groups.google.com/d/__msgid/elasticsearch/86187c3a-__0974-4d10-9689-e83da788c04a%__40googlegroups.com
    <https://groups.google.com/d/msgid/elasticsearch/86187c3a-0974-4d10-9689-e83da788c04a%40googlegroups.com>>


    <https://groups.google.com/d/____msgid/elasticsearch/86187c3a-____0974-4d10-9689-e83da788c04a%____40googlegroups.com
    <https://groups.google.com/d/__msgid/elasticsearch/86187c3a-__0974-4d10-9689-e83da788c04a%__40googlegroups.com>

    <https://groups.google.com/d/__msgid/elasticsearch/86187c3a-__0974-4d10-9689-e83da788c04a%__40googlegroups.com
    <https://groups.google.com/d/msgid/elasticsearch/86187c3a-0974-4d10-9689-e83da788c04a%40googlegroups.com>>>>.
                  >     > For more options, visithttps://groups.google.____com/groups/opt_out
             <http://groups.google.com/__groups/opt_out <http://groups.google.com/groups/opt_out>>
    <http://groups.google.com/____groups/opt_out <http://groups.google.com/__groups/opt_out>
             <http://groups.google.com/__groups/opt_out <http://groups.google.com/groups/opt_out>>>
    <https://groups.google.com/____groups/opt_out <https://groups.google.com/__groups/opt_out>
             <https://groups.google.com/__groups/opt_out <https://groups.google.com/groups/opt_out>>
                  <https://groups.google.com/____groups/opt_out <https://groups.google.com/__groups/opt_out>
    <https://groups.google.com/__groups/opt_out <https://groups.google.com/groups/opt_out>>>>.

                  >
                  >     --
                  >     Costin
                  >
                  > --
                  > You received this message because you are subscribed to the Google Groups "elasticsearch" group.
                  > To unsubscribe from this group and stop receiving emails from it, send an email to
                  >elasticsearc...@googlegroups.____com <mailto:elasticsearc...@__googlegroups.com
    <mailto:elasticsearc...@googlegroups.com>> <javascript:>.

                  > To view this discussion on the web visit


      >https://groups.google.com/d/____msgid/elasticsearch/e29e342d-____de74-4ed6-93d4-875fc728c5a5%____40googlegroups.com <https://groups.google.com/d/__msgid/elasticsearch/e29e342d-__de74-4ed6-93d4-875fc728c5a5%__40googlegroups.com>

    <https://groups.google.com/d/__msgid/elasticsearch/e29e342d-__de74-4ed6-93d4-875fc728c5a5%__40googlegroups.com
    <https://groups.google.com/d/msgid/elasticsearch/e29e342d-de74-4ed6-93d4-875fc728c5a5%40googlegroups.com>>


    <https://groups.google.com/d/____msgid/elasticsearch/e29e342d-____de74-4ed6-93d4-875fc728c5a5%____40googlegroups.com
    <https://groups.google.com/d/__msgid/elasticsearch/e29e342d-__de74-4ed6-93d4-875fc728c5a5%__40googlegroups.com>

    <https://groups.google.com/d/__msgid/elasticsearch/e29e342d-__de74-4ed6-93d4-875fc728c5a5%__40googlegroups.com
    <https://groups.google.com/d/msgid/elasticsearch/e29e342d-de74-4ed6-93d4-875fc728c5a5%40googlegroups.com>>>.

                  > For more options, visithttps://groups.google.____com/groups/opt_out
             <http://groups.google.com/__groups/opt_out <http://groups.google.com/groups/opt_out>>
    <https://groups.google.com/____groups/opt_out <https://groups.google.com/__groups/opt_out>

             <https://groups.google.com/__groups/opt_out <https://groups.google.com/groups/opt_out>>>.

                  --
                  Costin

             --
             You received this message because you are subscribed to the Google Groups "elasticsearch" group.
             To unsubscribe from this group and stop receiving emails from it, send an email to
             elasticsearch+unsubscribe@__go__oglegroups.com <http://googlegroups.com>
    <mailto:elasticsearch%__2Bunsubscribe@googlegroups.com <mailto:elasticsearch%252Bunsubscribe@googlegroups.com>__>
             <mailto:elasticsearch+____unsubscribe@googlegroups.com
    <mailto:elasticsearch%2B__unsubscribe@googlegroups.com> <mailto:elasticsearch%__2Bunsubscribe@googlegroups.com
    <mailto:elasticsearch%252Bunsubscribe@googlegroups.com>__>>.


             To view this discussion on the web visit
    https://groups.google.com/d/____msgid/elasticsearch/c1081bf2-____117a-4af2-ba90-2c38a4572782%____40googlegroups.com
    <https://groups.google.com/d/__msgid/elasticsearch/c1081bf2-__117a-4af2-ba90-2c38a4572782%__40googlegroups.com>

    <https://groups.google.com/d/__msgid/elasticsearch/c1081bf2-__117a-4af2-ba90-2c38a4572782%__40googlegroups.com
    <https://groups.google.com/d/msgid/elasticsearch/c1081bf2-117a-4af2-ba90-2c38a4572782%40googlegroups.com>>

    <https://groups.google.com/d/____msgid/elasticsearch/c1081bf2-____117a-4af2-ba90-2c38a4572782%____40googlegroups.com?utm___medium=__email&utm_source=__footer
    <https://groups.google.com/d/__msgid/elasticsearch/c1081bf2-__117a-4af2-ba90-2c38a4572782%__40googlegroups.com?utm_medium=__email&utm_source=footer>

    <https://groups.google.com/d/__msgid/elasticsearch/c1081bf2-__117a-4af2-ba90-2c38a4572782%__40googlegroups.com?utm_medium=__email&utm_source=footer
    <https://groups.google.com/d/msgid/elasticsearch/c1081bf2-117a-4af2-ba90-2c38a4572782%40googlegroups.com?utm_medium=email&utm_source=footer>>>.

             For more options, visit https://groups.google.com/d/____optout <https://groups.google.com/d/__optout>
    <https://groups.google.com/d/__optout <https://groups.google.com/d/optout>>.


         --
         Costin

         --

         You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
         To unsubscribe from this topic, visit
    https://groups.google.com/d/____topic/elasticsearch/S-____BrzwUHJbM/unsubscribe
    <https://groups.google.com/d/__topic/elasticsearch/S-__BrzwUHJbM/unsubscribe>
         <https://groups.google.com/d/__topic/elasticsearch/S-__BrzwUHJbM/unsubscribe
    <https://groups.google.com/d/topic/elasticsearch/S-BrzwUHJbM/unsubscribe>>.
         To unsubscribe from this group and all its topics, send an email to
    elasticsearch+unsubscribe@__go__oglegroups.com <http://googlegroups.com>
         <mailto:elasticsearch%__2Bunsubscribe@googlegroups.com
    <mailto:elasticsearch%252Bunsubscribe@googlegroups.com>__>.

         To view this discussion on the web visit
    https://groups.google.com/d/____msgid/elasticsearch/532AE2B5.____8080004%40gmail.com
    <https://groups.google.com/d/__msgid/elasticsearch/532AE2B5.__8080004%40gmail.com>
         <https://groups.google.com/d/__msgid/elasticsearch/532AE2B5.__8080004%40gmail.com
    <https://groups.google.com/d/msgid/elasticsearch/532AE2B5.8080004%40gmail.com>>.

         For more options, visit https://groups.google.com/d/____optout <https://groups.google.com/d/__optout>
    <https://groups.google.com/d/__optout <https://groups.google.com/d/optout>>.



    --
    You received this message because you are subscribed to the Google Groups "elasticsearch" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to
    elasticsearch+unsubscribe@__googlegroups.com <mailto:elasticsearch%2Bunsubscribe@googlegroups.com>
    <mailto:elasticsearch+__unsubscribe@googlegroups.com <mailto:elasticsearch%2Bunsubscribe@googlegroups.com>>.
    To view this discussion on the web visit
    https://groups.google.com/d/__msgid/elasticsearch/CALD%__2B6GNJD0wMJPzXwQqvfL4%__2B0nZmw4XzFrPdEc%__2BOPLZVeNuZpw%40mail.gmail.com
    <https://groups.google.com/d/msgid/elasticsearch/CALD%2B6GNJD0wMJPzXwQqvfL4%2B0nZmw4XzFrPdEc%2BOPLZVeNuZpw%40mail.gmail.com>
    <https://groups.google.com/d/__msgid/elasticsearch/CALD%__2B6GNJD0wMJPzXwQqvfL4%__2B0nZmw4XzFrPdEc%__2BOPLZVeNuZpw%40mail.gmail.__com?utm_medium=email&utm___source=footer
    <https://groups.google.com/d/msgid/elasticsearch/CALD%2B6GNJD0wMJPzXwQqvfL4%2B0nZmw4XzFrPdEc%2BOPLZVeNuZpw%40mail.gmail.com?utm_medium=email&utm_source=footer>>.

    For more options, visit https://groups.google.com/d/__optout <https://groups.google.com/d/optout>.


--
Costin

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/__topic/elasticsearch/S-__BrzwUHJbM/unsubscribe
<https://groups.google.com/d/topic/elasticsearch/S-BrzwUHJbM/unsubscribe>.
To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscribe@__googlegroups.com
<mailto:elasticsearch%2Bunsubscribe@googlegroups.com>.
To view this discussion on the web visit
https://groups.google.com/d/__msgid/elasticsearch/53343AA6.__1000405%40gmail.com
<https://groups.google.com/d/msgid/elasticsearch/53343AA6.1000405%40gmail.com>.

For more options, visit https://groups.google.com/d/__optout <https://groups.google.com/d/optout>.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
elasticsearch+unsubscribe@googlegroups.com mailto:elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CALD%2B6GMsgCB2Yqs2LLsbGinXSBOhB4ULVX1eaMm0vTvGpgLY7A%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CALD%2B6GMsgCB2Yqs2LLsbGinXSBOhB4ULVX1eaMm0vTvGpgLY7A%40mail.gmail.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout.

--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/53726213.8040606%40gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Nick Pentreath) #13

Ok - well let me know when you're around.

The mapreduce inputformat works fine. I'm using it with Spark to access the
ES data via ESInputFormat and run analytics and machine learning jobs on
that data, and the same _ts field works and is the correct data (though it
comes through as org.apache.hadoop.io.Text, which I convert to Long or a
DateTime as required).

Perhaps I'm missing it somewhere but is it possible to force a field to be
a type? i.e. similar the es.field.mapping could I tell it that it must
parse the field as a string (since then I can take it and do whatever
parsing / casting I want).

I could just use the new Spark SQL module (which I'm seriously considering
right now having explored it a bit in the last few days), but some of the
stuff we do requires a SQL Console and JDBC, so having Shark able to just
pull in ES data is definitely very useful...

On Tue, May 13, 2014 at 8:18 PM, Costin Leau costin.leau@gmail.com wrote:

Hi Nick,

I'm glad to see you are making progress. This week I'm mainly on the road
but maybe we can meet on the IRC next week, my invitation still stands :slight_smile:
Timestamp is relatively new type and doesn't handle timezones properly -
it is backed by java.sq.Timestamp so it inherits a lot of its issues.
For some reason the year in your date is rather off so it's worth checking
the data read by es-hadoop before passing it to Hive (see [1]).
I've had issues myself with it and it the moment the cluster is in a
different timezone than the dataset itself things get buggy.
Try using a UDF to do the conversion from the long to a timestamp - I've
tried doing something similar in our conversion but since we don't know the
timezones
used, it's easy for things to get mixed.

Cheers,

[1] http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/
current/troubleshooting.html

On 5/13/14 8:25 PM, Nick Pentreath wrote:

Hi Costin

Sorry for the silence on this issue. This went a bit quiet.

But the good news is I've come back to it and managed to get it all
working with the new shark 0.9.1 release and
2.0.0RC1. Actually if I used ADD JAR I got the same exception but when I
just put the JAR into the shark lib/ folder it
worked fine (which seems to point to the classpath issue you mention).

However, I seem to have an issue with date <-> timestamp conversion.

I have a field in ES called "_ts" that has type "date" and the default
format "dateOptionalTime". When I do a query that
includes the timestamp it comes back NULL:

select ts from table ...
(note I use a correct es.mapping.names to map the _ts field in ES to ts
field in Hive/Shark that has timestamp type).

below is some of the debug-level output:

14/05/13 19:19:47 DEBUG lazy.LazyPrimitive: Data not in the TIMESTAMP
data type range so converted to null. Given data
is :96997506-06-30 19:08:168:16.768
14/05/13 19:19:47 DEBUG lazy.LazyPrimitive: Data not in the TIMESTAMP
data type range so converted to null. Given data
is :96997605-06-28 19:08:168:16.768
14/05/13 19:19:47 DEBUG lazy.LazyPrimitive: Data not in the TIMESTAMP
data type range so converted to null. Given data
is :96997624-06-28 19:08:168:16.768
14/05/13 19:19:47 DEBUG lazy.LazyPrimitive: Data not in the TIMESTAMP
data type range so converted to null. Given data
is :96997629-06-28 19:08:168:16.768
14/05/13 19:19:47 DEBUG lazy.LazyPrimitive: Data not in the TIMESTAMP
data type range so converted to null. Given data
is :96997634-06-29 19:08:168:16.768
NULL
NULL
NULL
NULL
NULL

The data that I index in the _ts field is timestamp in ms (long). It
doesn't seem to be converted correctly but the data
is correct (in ms at least) and I can query against it using date formats
and date math in ES.

Example snippet from debug log from above:
,"_ts":1397130475607}}]}}"

Any ideas or am I doing something silly?

I do see that the Hive timestamp expects either seconds since epoch of a
string-based format that has nanosecond
granularity. Is this the issue with just ms long timestamp data?

Thanks
Nick

On Thu, Mar 27, 2014 at 4:50 PM, Costin Leau <costin.leau@gmail.com<mailto:
costin.leau@gmail.com>> wrote:

Using the latest hive and hadoop is preferred as they contain various

bug fixes.
The error suggests a classpath issue - namely the same class is
loaded twice for some reason and hence the casting
fails.

Let's connect on IRC - give me a ping when you're available (user is

costin).

Cheers,


On 3/27/14 4:29 PM, Nick Pentreath wrote:

    Thanks for the response.

    I tried latest Shark (cdh4 version of 0.9.1 here

http://cloudera.rst.im/shark/ ) - this uses hadoop 1.0.4 and
hive 0.11
I believe, and build elasticsearch-hadoop from github master.

    Still getting same error:
    org.elasticsearch.hadoop.hive.__EsHiveInputFormat$EsHiveSplit

cannot be cast to
org.elasticsearch.hadoop.hive.__EsHiveInputFormat$EsHiveSplit

    Will using hive 0.11 / hadoop 1.0.4 vs hive 0.12 / hadoop 1.2.1

in es-hadoop master make a difference?

    Anyone else actually got this working?



    On Thu, Mar 20, 2014 at 2:44 PM, Costin Leau <

costin.leau@gmail.com mailto:costin.leau@gmail.com
<mailto:costin.leau@gmail.com mailto:costin.leau@gmail.com>__>
wrote:

         I recommend using master - there are several improvements

done in this area. Also using the latest Shark
(0.9.0) and
Hive (0.12) will help.

         On 3/20/14 12:00 PM, Nick Pentreath wrote:

             Hi

             I am struggling to get this working too. I'm just trying

locally for now, running Shark 0.8.1, Hive
0.9.0 and ES
1.0.1
with ES-hadoop 1.3.0.M2.

             I managed to get a basic example working with WRITING

into an index. But I'm really after READING and
index.

             I believe I have set everything up correctly, I've added

the jar to Shark:
ADD JAR /path/to/es-hadoop.jar;

             created a table:
             CREATE EXTERNAL TABLE test_read (name string, price

double)

             STORED BY 'org.elasticsearch.hadoop.____

hive.EsStorageHandler'

             TBLPROPERTIES('es.resource' =

'test_index/test_type/_search?____q=*');

             And then trying to 'SELECT * FROM test _read' gives me :

             org.apache.spark.____SparkException: Job aborted: Task

3.0:0 failed more than 0 times; aborting job
java.lang.ClassCastException:
org.elasticsearch.hadoop.hive.____EsHiveInputFormat$__ESHiveSplit cannot
be cast to
org.elasticsearch.hadoop.hive.__EsHiveInputFormat$
ESHiveSplit

             at org.apache.spark.scheduler.___

_DAGScheduler$$anonfun$____abortStage$1.apply(____DAGScheduler.scala:827)

             at org.apache.spark.scheduler.___

_DAGScheduler$$anonfun$____abortStage$1.apply(____DAGScheduler.scala:825)

             at scala.collection.mutable.____

ResizableArray$class.foreach(____ResizableArray.scala:60)

             at scala.collection.mutable.____ArrayBuffer.foreach(____

ArrayBuffer.scala:47)

             at org.apache.spark.scheduler.___

_DAGScheduler.abortStage(____DAGScheduler.scala:825)

             at org.apache.spark.scheduler.___

_DAGScheduler.processEvent(____DAGScheduler.scala:440)

             at org.apache.spark.scheduler.____DAGScheduler.org
             <http://org.apache.spark.__scheduler.DAGScheduler.org
    <http://org.apache.spark.scheduler.DAGScheduler.org>>$_

_apache$spark$__scheduler$__DAGScheduler$$run(____DAGScheduler.scala:502)

             at org.apache.spark.scheduler.___

_DAGScheduler$$anon$1.run(____DAGScheduler.scala:157)

             FAILED: Execution Error, return code -101 from

shark.execution.SparkTask

             In fact I get the same error thrown when trying to READ

from the table that I successfully WROTE to...

             On Saturday, 22 February 2014 12:31:21 UTC+2, Costin

Leau wrote:

                  Yeah, it might have been some sort of network

configuration issue where services where running on
different
machines
and
localhost pointed to a different location.

                  Either way, I'm glad to hear things have are moving

forward.

                  Cheers,

                  On 22/02/2014 1:06 AM, Max Lang wrote:
                  > I managed to get it working on ec2 without issue

this time. I'd say the biggest difference was
that this
time I set up a
> dedicated ES machine. Is it possible that,
because I was using a cluster with slaves, when I used
"localhost" the slaves
> couldn't find the ES instance running on the
master? Or do all the requests go through the master?
>
>
> On Wednesday, February 19, 2014 2:35:40 PM UTC-8,
Costin Leau wrote:
>
> Hi,
>
> Setting logging in Hive/Hadoop can be tricky
since the log4j needs to be picked up by the
running JVM
otherwise you
> won't see anything.
> Take a look at this link on how to tell Hive
to use your logging settings [1].
>
> For the next release, we might introduce
dedicated exceptions for the simple fact that some
libraries, like Hive,
> swallow the stack trace and it's unclear what
the issue is which makes the exception
(IllegalStateException) ambiguous.
>
> Let me know how it goes and whether you will
encounter any issues with Shark. Or if you don't :slight_smile:
>
> Thanks!
>
>
[1]https://cwiki.apache.org/confluence/display/Hive/
GettingStarted#GettingStarted-____ErrorLogs
<https://cwiki.apache.org/confluence/display/Hive/
GettingStarted#GettingStarted-__ErrorLogs>
<https://cwiki.apache.org/confluence/display/Hive/
GettingStarted#GettingStarted-__ErrorLogs
<https://cwiki.apache.org/confluence/display/Hive/
GettingStarted#GettingStarted-ErrorLogs>>

      <https://cwiki.apache.org/____confluence/display/Hive/____

GettingStarted#GettingStarted-____ErrorLogs
<https://cwiki.apache.org/confluence/display/Hive/
GettingStarted#GettingStarted-__ErrorLogs>
<https://cwiki.apache.org/confluence/display/Hive/
GettingStarted#GettingStarted-__ErrorLogs
<https://cwiki.apache.org/confluence/display/Hive/
GettingStarted#GettingStarted-ErrorLogs>>>
>
<https://cwiki.apache.org/confluence/display/Hive/
GettingStarted#GettingStarted-____ErrorLogs
<https://cwiki.apache.org/confluence/display/Hive/
GettingStarted#GettingStarted-__ErrorLogs>
<https://cwiki.apache.org/confluence/display/Hive/
GettingStarted#GettingStarted-__ErrorLogs
<https://cwiki.apache.org/confluence/display/Hive/
GettingStarted#GettingStarted-ErrorLogs>>

      <https://cwiki.apache.org/____confluence/display/Hive/____

GettingStarted#GettingStarted-____ErrorLogs
<https://cwiki.apache.org/confluence/display/Hive/
GettingStarted#GettingStarted-__ErrorLogs>

             <https://cwiki.apache.org/__confluence/display/Hive/__

GettingStarted#GettingStarted-__ErrorLogs
<https://cwiki.apache.org/confluence/display/Hive/
GettingStarted#GettingStarted-ErrorLogs>>>>
>
> On 20/02/2014 12:02 AM, Max Lang wrote:
> > Hey Costin,
> >
> > Thanks for the swift reply. I abandoned EC2
to take that out of the equation and managed
to get
everything working
> > locally using the latest version of
everything (though I realized just now I'm still on
hive 0.9).
I'm guessing you're
> > right about some port connection issue
because I definitely had ES running on that machine.
> >
> > I changed hive-log4j.properties and added
> > |
> > #custom logging levels
> > #log4j.logger.xxx=DEBUG
> > log4j.logger.org http://log4j.logger.org
http://log4j.logger.org.____elasticsearch.hadoop.rest=____TRACE
> >log4j.logger.org.__elasticsea__rch.hadoop.mr<
http://elasticsearch.hadoop.mr>
<http://log4j.logger.org.__elasticsearch.hadoop.mr <
http://log4j.logger.org.elasticsearch.hadoop.mr>>
<http://log4j.logger.org.__ela__sticsearch.hadoop.mr <
http://elasticsearch.hadoop.mr>
<http://log4j.logger.org.__elasticsearch.hadoop.mr <
http://log4j.logger.org.elasticsearch.hadoop.mr>>>
<http://log4j.logger.org.__ela
__sticsearch.hadoop.mr http://elasticsearch.hadoop.mr
<http://log4j.logger.org.__elasticsearch.hadoop.mr <
http://log4j.logger.org.elasticsearch.hadoop.mr>>
<http://log4j.logger.org.__ela__sticsearch.hadoop.mr <
http://elasticsearch.hadoop.mr>
<http://log4j.logger.org.__elasticsearch.hadoop.mr <
http://log4j.logger.org.elasticsearch.hadoop.mr>>>>=____TRACE

                  >     > |
                  >     >
                  >     > But I didn't see any trace logging.

Hopefully I can get it working on EC2 without issue,
but, for
the future, is this
> > the correct way to set TRACE logging?
> >
> > Oh and, for reference, I tried running
without ES up and I got the following, exceptions:
> >
> > 2014-02-19 13:46:08,803 ERROR
shark.SharkDriver (Logging.scala:logError(64)) - FAILED: Hive
Internal Error:
> > java.lang.____IllegalStateException(Cannot
discover Elasticsearch version)
> > java.lang.____IllegalStateException:
Cannot discover Elasticsearch version
> > at org.elasticsearch.hadoop.hive.
____EsStorageHandler.init(____EsStorageHandler.java:101)
> > at

    org.elasticsearch.hadoop.hive.____EsStorageHandler.____

configureOutputJobProperties(____EsStorageHandler.java:83)
> > at

    org.apache.hadoop.hive.ql.____plan.PlanUtils.____

configureJobPropertiesForStora____geHandler(PlanUtils.java:__706)
> > at

    org.apache.hadoop.hive.ql.____plan.PlanUtils.____

configureOutputJobPropertiesFo____rStorageHandler(PlanUtils.____java:675)
> > at
org.apache.hadoop.hive.ql.exec.FileSinkOperator.
augmentPlan(FileSinkOperator.____java:764)
> > at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.
putOpInsertMap(____SemanticAnalyzer.java:1518)
> > at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.
genFileSinkPlan(____SemanticAnalyzer.java:4337)
> > at

    org.apache.hadoop.hive.ql.____parse.SemanticAnalyzer.____

genPostGroupByBodyPlan(____SemanticAnalyzer.java:6207)
> > at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.
genBodyPlan(SemanticAnalyzer.____java:6138)
> > at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.
genPlan(SemanticAnalyzer.java:____6764)
> > at
shark.parse.__SharkSemanticAnalyzer.analyzeInternal(
SharkSemanticAnalyzer.scala:149)
> > at
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.
analyze(BaseSemanticAnalyzer.java:244)
> > at shark.SharkDriver.compile(

SharkDriver.scala:215)
> > at org.apache.hadoop.hive.ql.

Driver.compile(Driver.java:336)
> > at org.apache.hadoop.hive.ql.

Driver.run(Driver.java:895)
> > at shark.SharkCliDriver.

processCmd(SharkCliDriver.scala:324)
> > at org.apache.hadoop.hive.cli.

CliDriver.processLine(CliDriver.java:406)
> > at shark.SharkCliDriver$.main(

SharkCliDriver.scala:232)
> > at shark.SharkCliDriver.main(

SharkCliDriver.scala)

                  >     > Caused by: java.io.IOException: Out of

nodes and retries; caught exception
> > at org.elasticsearch.hadoop.rest.
____NetworkClient.execute(____NetworkClient.java:81)
> > at org.elasticsearch.hadoop.rest.
____RestClient.execute(__RestClient.__java:221)
> > at org.elasticsearch.hadoop.rest.
____RestClient.execute(__RestClient.__java:205)
> > at org.elasticsearch.hadoop.rest.
____RestClient.execute(__RestClient.__java:209)
> > at org.elasticsearch.hadoop.rest.
____RestClient.get(RestClient.____java:103)
> > at org.elasticsearch.hadoop.rest.
____RestClient.esVersion(____RestClient.java:274)
> > at

    org.elasticsearch.hadoop.rest.____InitializationUtils.____

discoverEsVersion(____InitializationUtils.java:84)
> > at org.elasticsearch.hadoop.hive.
____EsStorageHandler.init(____EsStorageHandler.java:99)

                  >     > ... 18 more
                  >     > Caused by: java.net.ConnectException:

Connection refused
> > at java.net.PlainSocketImpl.____socketConnect(Native
Method)
> > at java.net http://java.net
http://java.net.AbstractPlainSocketImpl.
doConnect(____AbstractPlainSocketImpl.java:____339)
> > at java.net http://java.net

    <http://java.net>.____AbstractPlainSocketImpl.____

connectToAddress(____AbstractPlainSocketImpl.java:____200)
> > at java.net http://java.net
http://java.net.AbstractPlainSocketImpl.connect(
AbstractPlainSocketImpl.java:182)
> > at java.net.SocksSocketImpl.

connect(SocksSocketImpl.java:391)
> > at java.net.Socket.connect(

Socket.java:579)
> > at java.net.Socket.connect(

Socket.java:528)
> > at java.net.Socket.(Socket.
____java:425)
> > at java.net.Socket.(Socket.
____java:280)
> > at

    org.apache.commons.httpclient.____protocol.____

DefaultProtocolSocketFactory.createSocket(
DefaultProtocolSocketFactory.____java:80)
> > at

    org.apache.commons.httpclient.____protocol.____

DefaultProtocolSocketFactory.createSocket(
DefaultProtocolSocketFactory.____java:122)
> > at org.apache.commons.httpclient.
____HttpConnection.open(____HttpConnection.java:707)
> > at
org.apache.commons.httpclient.HttpMethodDirector.
executeWithRetry(____HttpMethodDirector.java:387)
> > at
org.apache.commons.httpclient.HttpMethodDirector.
executeMethod(____HttpMethodDirector.java:171)
> > at org.apache.commons.httpclient.
____HttpClient.executeMethod(____HttpClient.java:397)
> > at org.apache.commons.httpclient.
____HttpClient.executeMethod(____HttpClient.java:323)
> > at

    org.elasticsearch.hadoop.rest.____commonshttp.____

CommonsHttpTransport.execute(____CommonsHttpTransport.java:__160)
> > at org.elasticsearch.hadoop.rest.
____NetworkClient.execute(____NetworkClient.java:74)

                  >     > ... 25 more
                  >     >
                  >     > Let me know if there's anything in

particular you'd like me to try on EC2.
> >
> > (For posterity, the versions I used were:
hadoop 2.2.0, hive 0.9.0, shark 8.1, spark 8.1,
es-hadoop
1.3.0.M2, java
> > 1.7.0_15, scala 2.9.3, elasticsearch 1.0.0)
> >
> > Thanks again,
> > Max
> >
> > On Tuesday, February 18, 2014 10:16:38 PM
UTC-8, Costin Leau wrote:
> >
> > The error indicates a network error -
namely es-hadoop cannot connect to Elasticsearch
on the
default (localhost:9200)
> > HTTP port. Can you double check whether
that's indeed the case (using curl or even
telnet on
that port) - maybe the
> > firewall prevents any connections to be
made...
> > Also you could try using the latest
Hive, 0.12 and a more recent Hadoop such as 1.1.2
or 1.2.1.
> >
> > Additionally, can you enable TRACE
logging in your job on es-hadoop packages
org.elasticsearch.hadoop.rest and
> >org.elasticsearch.hadoop.mr <
http://org.elasticsearch.hadoop.mr>
<http://org.elasticsearch.__hadoop.mr <http://org.elasticsearch.
hadoop.mr>>
<http://org.elasticsearch.__ha__doop.mr <
http://hadoop.mr> <http://org.elasticsearch.__hadoop.mr
http://org.elasticsearch.hadoop.mr>>
<http://org.elasticsearch.__ha__doop.mr <
http://hadoop.mr> <http://org.elasticsearch.__hadoop.mr
http://org.elasticsearch.hadoop.mr>
<http://org.elasticsearch.__ha__doop.mr <
http://hadoop.mr> <http://org.elasticsearch.__hadoop.mr
http://org.elasticsearch.hadoop.mr>>>
<http://org.elasticsearch.__ha__doop.mr <
http://hadoop.mr> <http://org.elasticsearch.__hadoop.mr
http://org.elasticsearch.hadoop.mr> <http://org.elasticsearch.
__ha__doop.mr http://hadoop.mr
<http://org.elasticsearch.__hadoop.mr <
http://org.elasticsearch.hadoop.mr>>>

                  >     <http://org.elasticsearch.__ha__doop.mr <

http://hadoop.mr>
<http://org.elasticsearch.__hadoop.mr <http://org.elasticsearch.
hadoop.mr>>
<http://org.elasticsearch.__ha__doop.mr <
http://hadoop.mr> <http://org.elasticsearch.__hadoop.mr
http://org.elasticsearch.hadoop.mr>>>> packages and report
back ?

                  >     >
                  >     >     Thanks,
                  >     >
                  >     >     On 19/02/2014 4:03 AM, Max Lang wrote:
                  >     >     > I set everything up using this
             guide:https://github.com/____

amplab/shark/wiki/Running-_Shark-on-EC2
https://github.com/__amplab/shark/wiki/Running-__Shark-on-EC2
<https://github.com/amplab/

shark/wiki/Running-Shark-on-EC2
https://github.com/amplab/shark/wiki/Running-Shark-on-EC2>
<https://github.com/amplab/

_shark/wiki/Running-Shark-on-_EC2
https://github.com/amplab/__shark/wiki/Running-Shark-on-__EC2
<https://github.com/amplab/

shark/wiki/Running-Shark-on-EC2
https://github.com/amplab/shark/wiki/Running-Shark-on-EC2>>
<https://github.com/amplab/

_shark/wiki/Running-Shark-on-_EC2
https://github.com/amplab/__shark/wiki/Running-Shark-on-__EC2
<https://github.com/amplab/

shark/wiki/Running-Shark-on-EC2
https://github.com/amplab/shark/wiki/Running-Shark-on-EC2>
<https://github.com/amplab/

_shark/wiki/Running-Shark-on-_EC2
https://github.com/amplab/__shark/wiki/Running-Shark-on-__EC2
<https://github.com/amplab/

shark/wiki/Running-Shark-on-EC2
https://github.com/amplab/shark/wiki/Running-Shark-on-EC2>>>
> > <https://github.com/amplab/

_shark/wiki/Running-Shark-on-_EC2
https://github.com/amplab/__shark/wiki/Running-Shark-on-__EC2
<https://github.com/amplab/

shark/wiki/Running-Shark-on-EC2
https://github.com/amplab/shark/wiki/Running-Shark-on-EC2>
<https://github.com/amplab/

_shark/wiki/Running-Shark-on-_EC2
https://github.com/amplab/__shark/wiki/Running-Shark-on-__EC2
<https://github.com/amplab/

shark/wiki/Running-Shark-on-EC2
https://github.com/amplab/shark/wiki/Running-Shark-on-EC2>>
> <https://github.com/amplab/

_shark/wiki/Running-Shark-on-_EC2
https://github.com/amplab/__shark/wiki/Running-Shark-on-__EC2
<https://github.com/amplab/

shark/wiki/Running-Shark-on-EC2
https://github.com/amplab/shark/wiki/Running-Shark-on-EC2>
<https://github.com/amplab/

_shark/wiki/Running-Shark-on-____EC2
https://github.com/amplab/__shark/wiki/Running-Shark-on-__EC2

             <https://github.com/amplab/__

shark/wiki/Running-Shark-on-__EC2
https://github.com/amplab/shark/wiki/Running-Shark-on-EC2>>>>
on an ec2 cluster. I've
> > > copied the elasticsearch-hadoop jars
into the hive lib directory and I have
elasticsearch
running on localhost:9200. I'm
> > > running shark in a screen session
with --service screenserver and connecting to it
at the
same time using shark -h
> > > localhost.
> > >
> > > Unfortunately, when I attempt to
write data into elasticsearch, it fails. Here's an
example:
> > >
> > > |
> > > [localhost:10000]shark>CREATE
EXTERNAL TABLE wiki (id BIGINT,title STRING,last_modified
STRING,xml STRING,text
> > > STRING)ROW FORMAT DELIMITED FIELDS
TERMINATED BY '\t'LOCATION
's3n://spark-data/wikipedia-____sample/';

                  >     >     > Timetaken (including network

latency):0.159seconds
> > > 14/02/1901:23:33INFO
CliDriver:Timetaken (including network latency):0.159seconds
> > >
> > > [localhost:10000]shark>SELECT title
FROM wiki LIMIT 1;
> > > Alpokalja
> > > Timetaken (including network
latency):2.23seconds
> > > 14/02/1901:23:48INFO
CliDriver:Timetaken (including network latency):2.23seconds
> > >
> > > [localhost:10000]shark>CREATE
EXTERNAL TABLE es_wiki (id BIGINT,title
STRING,last_modified
STRING,xml STRING,text
> > > STRING)STORED BY

    'org.elasticsearch.hadoop.____hive.EsStorageHandler'____

TBLPROPERTIES('es.resource'='____wikipedia/article');

                  >     >     > Timetaken (including network

latency):0.061seconds
> > > 14/02/1901:33:51INFO
CliDriver:Timetaken (including network latency):0.061seconds
> > >
> > > [localhost:10000]shark>INSERT
OVERWRITE TABLE es_wiki SELECTw.id
http://w.id,w.title,w.last_____modified,w.xml,w.text
FROM wiki w;
> > > [HiveError]:Queryreturned non-zero
code:9,cause:FAILED:____ExecutionError,returncode
-101fromshark.execution.____SparkTask

                  >     >     > Timetaken (including network

latency):3.575seconds
> > > 14/02/1901:34:42INFO
CliDriver:Timetaken (including network latency):3.575seconds
> > > |
> > >
> > > The stack trace looks like this:
> > >
> > > org.apache.hadoop.hive.ql.____
metadata.HiveException
(org.apache.hadoop.hive.ql.____metadata.HiveException:
java.io.IOException:

                  >     >     > Out of nodes and retries; caught

exception)
> > >
> > >

    org.apache.hadoop.hive.ql.____exec.FileSinkOperator.____

processOp(FileSinkOperator.java:602)shark.execution.
FileSinkOperator$$anonfun$processPartition$1.apply(
FileSinkOperator.scala:84)shark.execution.
FileSinkOperator$$anonfun$processPartition$1.apply(
FileSinkOperator.scala:81)scala.collection.Iterator$
class.foreach(Iterator.scala:__772)scala.collection.
Iterator$__$anon$19.foreach(__Iterator.scala:399)shark.
execution.__FileSinkOperator.processPartition(
FileSinkOperator.scala:81)shark.execution.
FileSinkOperator$.writeFiles$____1(FileSinkOperator.scala:
207)____shark.execution.FileSinkOperator$$anonfun$
executeProcessFileSinkPartitio____n$1.apply(FileSinkOperator.____scala:
211)shark.execution.FileSinkOperator$$anonfun$
executeProcessFileSinkPartitio____n$1.apply(FileSinkOperator.____scala:
211)org.apache.spark.___scheduler.ResultTask.runTask(
___ResultTask.scala:107)org.____apache.spark.scheduler.Ta

sk.____run(Task.scala:53)org.apache.____spark.executor.Executor$____Task

Runner$$anonfun$run$1.__apply$__mcV$sp(Executor.scala:__215)

__org.apac

         he.spa


                  rk.dep
                  >
                  >     loy.Sp
                  >     >
                  >     >

    arkHadoopUtil.runAsUser(____SparkHadoopUtil.scala:50)org._

___apache.spark.executor.Executor$TaskRunner.run(
Executor.scala:182)java.util.__concurrent.ThreadPoolExecutor.
runWorker(ThreadPoolExecutor.java:1145)java.util.
concurrent.ThreadPoolExecutor$__Worker.run(
ThreadPoolExecutor.__java:615)__java.lang.Thread.run(__Thread.__java:744

                  >
                  >     >
                  >     >     > I should be using Hive 0.9.0, shark

0.8.1, elasticsearch 1.0.0, Hadoop 1.0.4, and
java 1.7.0_51
> > > Based on my cursory look at the
hadoop and elasticsearch-hadoop sources, it looks
like hive
is just rethrowing an
> > > IOException it's getting from Spark,
and elasticsearch-hadoop is just hitting those
exceptions.
> > > I suppose my questions are: Does this
look like an issue with my ES/elasticsearch-hadoop
config? And has anyone gotten
> > > elasticsearch working with
Spark/Shark?
> > > Any ideas/insights are appreciated.
> > > Thanks,Max
> > >
> > > --
> > > You received this message because you
are subscribed to the Google Groups
"elasticsearch" group.
> > > To unsubscribe from this group and
stop receiving emails from it, send an email to
> > >elasticsearc...@googlegroups.____com
<mailto:elasticsearc...@__googlegroups.com
mailto:elasticsearc...@googlegroups.com> <javascript:>.

                  >     >     > To view this discussion on the web

visit
> >

      >https://groups.google.com/d/____msgid/elasticsearch/

9486faff-____3eaf-4344-8931-3121bbc5d9c7%____40googlegroups.com <
https://groups.google.com/d/__msgid/elasticsearch/9486faff-
__3eaf-4344-8931-3121bbc5d9c7%__40googlegroups.com>

    <https://groups.google.com/d/__msgid/elasticsearch/9486faff-

__3eaf-4344-8931-3121bbc5d9c7%__40googlegroups.com
<https://groups.google.com/d/msgid/elasticsearch/9486faff-
3eaf-4344-8931-3121bbc5d9c7%40googlegroups.com>>

    <https://groups.google.com/d/____msgid/elasticsearch/

9486faff-____3eaf-4344-8931-3121bbc5d9c7%____40googlegroups.com
<https://groups.google.com/d/__msgid/elasticsearch/9486faff-
__3eaf-4344-8931-3121bbc5d9c7%__40googlegroups.com>

    <https://groups.google.com/d/__msgid/elasticsearch/9486faff-

__3eaf-4344-8931-3121bbc5d9c7%__40googlegroups.com
<https://groups.google.com/d/msgid/elasticsearch/9486faff-
3eaf-4344-8931-3121bbc5d9c7%40googlegroups.com>>>
>

    <https://groups.google.com/d/____msgid/elasticsearch/

9486faff-____3eaf-4344-8931-3121bbc5d9c7%____40googlegroups.com
<https://groups.google.com/d/__msgid/elasticsearch/9486faff-
__3eaf-4344-8931-3121bbc5d9c7%__40googlegroups.com>

    <https://groups.google.com/d/__msgid/elasticsearch/9486faff-

__3eaf-4344-8931-3121bbc5d9c7%__40googlegroups.com
<https://groups.google.com/d/msgid/elasticsearch/9486faff-
3eaf-4344-8931-3121bbc5d9c7%40googlegroups.com>>

    <https://groups.google.com/d/____msgid/elasticsearch/

9486faff-____3eaf-4344-8931-3121bbc5d9c7%____40googlegroups.com
<https://groups.google.com/d/__msgid/elasticsearch/9486faff-
__3eaf-4344-8931-3121bbc5d9c7%__40googlegroups.com>

    <https://groups.google.com/d/__msgid/elasticsearch/9486faff-

__3eaf-4344-8931-3121bbc5d9c7%__40googlegroups.com
<https://groups.google.com/d/msgid/elasticsearch/9486faff-
3eaf-4344-8931-3121bbc5d9c7%40googlegroups.com>>>>
> >

    <https://groups.google.com/d/____msgid/elasticsearch/

9486faff-____3eaf-4344-8931-3121bbc5d9c7%____40googlegroups.com
<https://groups.google.com/d/__msgid/elasticsearch/9486faff-
__3eaf-4344-8931-3121bbc5d9c7%__40googlegroups.com>

    <https://groups.google.com/d/__msgid/elasticsearch/9486faff-

__3eaf-4344-8931-3121bbc5d9c7%__40googlegroups.com
<https://groups.google.com/d/msgid/elasticsearch/9486faff-
3eaf-4344-8931-3121bbc5d9c7%40googlegroups.com>>

    <https://groups.google.com/d/____msgid/elasticsearch/

9486faff-____3eaf-4344-8931-3121bbc5d9c7%____40googlegroups.com
<https://groups.google.com/d/__msgid/elasticsearch/9486faff-
__3eaf-4344-8931-3121bbc5d9c7%__40googlegroups.com>

    <https://groups.google.com/d/__msgid/elasticsearch/9486faff-

__3eaf-4344-8931-3121bbc5d9c7%__40googlegroups.com
<https://groups.google.com/d/msgid/elasticsearch/9486faff-
3eaf-4344-8931-3121bbc5d9c7%40googlegroups.com>>>
>

    <https://groups.google.com/d/____msgid/elasticsearch/

9486faff-____3eaf-4344-8931-3121bbc5d9c7%____40googlegroups.com
<https://groups.google.com/d/__msgid/elasticsearch/9486faff-
__3eaf-4344-8931-3121bbc5d9c7%__40googlegroups.com>

    <https://groups.google.com/d/__msgid/elasticsearch/9486faff-

__3eaf-4344-8931-3121bbc5d9c7%__40googlegroups.com
<https://groups.google.com/d/msgid/elasticsearch/9486faff-
3eaf-4344-8931-3121bbc5d9c7%40googlegroups.com>>

    <https://groups.google.com/d/____msgid/elasticsearch/

9486faff-____3eaf-4344-8931-3121bbc5d9c7%____40googlegroups.com
<https://groups.google.com/d/__msgid/elasticsearch/9486faff-
__3eaf-4344-8931-3121bbc5d9c7%__40googlegroups.com>

    <https://groups.google.com/d/__msgid/elasticsearch/9486faff-

__3eaf-4344-8931-3121bbc5d9c7%__40googlegroups.com
<https://groups.google.com/d/msgid/elasticsearch/9486faff-
3eaf-4344-8931-3121bbc5d9c7%40googlegroups.com>>>>>.
> > > For more options,
visithttps://groups.google.____com/groups/opt_out
<http://groups.google.com/__groups/opt_out <
http://groups.google.com/groups/opt_out>>
<http://groups.google.com/____groups/opt_out <
http://groups.google.com/__groups/opt_out>
<http://groups.google.com/__groups/opt_out <
http://groups.google.com/groups/opt_out>>>
<http://groups.google.com/____groups/opt_out <
http://groups.google.com/__groups/opt_out>
<http://groups.google.com/__groups/opt_out <
http://groups.google.com/groups/opt_out>>

                  <http://groups.google.com/____groups/opt_out <

http://groups.google.com/__groups/opt_out>
<http://groups.google.com/__groups/opt_out <
http://groups.google.com/groups/opt_out>>>>
<https://groups.google.com/____groups/opt_out <
https://groups.google.com/__groups/opt_out>
<https://groups.google.com/__groups/opt_out <
https://groups.google.com/groups/opt_out>>
<https://groups.google.com/____groups/opt_out <
https://groups.google.com/__groups/opt_out>
<https://groups.google.com/__groups/opt_out <
https://groups.google.com/groups/opt_out>>>
> <https://groups.google.com/____groups/opt_out<
https://groups.google.com/__groups/opt_out>
<https://groups.google.com/__groups/opt_out <
https://groups.google.com/groups/opt_out>>
<https://groups.google.com/____groups/opt_out <
https://groups.google.com/__groups/opt_out>
<https://groups.google.com/__groups/opt_out <
https://groups.google.com/groups/opt_out>>>>>.

                  >     >
                  >     >     --
                  >     >     Costin
                  >     >
                  >     > --
                  >     > You received this message because you are

subscribed to the Google Groups "elasticsearch"
group.
> > To unsubscribe from this group and stop
receiving emails from it, send an email to
> >elasticsearc...@googlegroups.____com
<mailto:elasticsearc...@__googlegroups.com
mailto:elasticsearc...@googlegroups.com> <javascript:>.

                  >     > To view this discussion on the web visit
                  >

      >https://groups.google.com/d/____msgid/elasticsearch/

86187c3a-____0974-4d10-9689-e83da788c04a%____40googlegroups.com <
https://groups.google.com/d/__msgid/elasticsearch/86187c3a-
__0974-4d10-9689-e83da788c04a%__40googlegroups.com>

    <https://groups.google.com/d/__msgid/elasticsearch/86187c3a-

__0974-4d10-9689-e83da788c04a%__40googlegroups.com
<https://groups.google.com/d/msgid/elasticsearch/86187c3a-
0974-4d10-9689-e83da788c04a%40googlegroups.com>>

    <https://groups.google.com/d/____msgid/elasticsearch/

86187c3a-____0974-4d10-9689-e83da788c04a%____40googlegroups.com
<https://groups.google.com/d/__msgid/elasticsearch/86187c3a-
__0974-4d10-9689-e83da788c04a%__40googlegroups.com>

    <https://groups.google.com/d/__msgid/elasticsearch/86187c3a-

__0974-4d10-9689-e83da788c04a%__40googlegroups.com
<https://groups.google.com/d/msgid/elasticsearch/86187c3a-
0974-4d10-9689-e83da788c04a%40googlegroups.com>>>
>

    <https://groups.google.com/d/____msgid/elasticsearch/

86187c3a-____0974-4d10-9689-e83da788c04a%____40googlegroups.com
<https://groups.google.com/d/__msgid/elasticsearch/86187c3a-
__0974-4d10-9689-e83da788c04a%__40googlegroups.com>

    <https://groups.google.com/d/__msgid/elasticsearch/86187c3a-

__0974-4d10-9689-e83da788c04a%__40googlegroups.com
<https://groups.google.com/d/msgid/elasticsearch/86187c3a-
0974-4d10-9689-e83da788c04a%40googlegroups.com>>

    <https://groups.google.com/d/____msgid/elasticsearch/

86187c3a-____0974-4d10-9689-e83da788c04a%____40googlegroups.com
<https://groups.google.com/d/__msgid/elasticsearch/86187c3a-
__0974-4d10-9689-e83da788c04a%__40googlegroups.com>

    <https://groups.google.com/d/__msgid/elasticsearch/86187c3a-

__0974-4d10-9689-e83da788c04a%__40googlegroups.com
<https://groups.google.com/d/msgid/elasticsearch/86187c3a-
0974-4d10-9689-e83da788c04a%40googlegroups.com>>>>.
> > For more options,
visithttps://groups.google.____com/groups/opt_out
<http://groups.google.com/__groups/opt_out <
http://groups.google.com/groups/opt_out>>
<http://groups.google.com/____groups/opt_out <
http://groups.google.com/__groups/opt_out>
<http://groups.google.com/__groups/opt_out <
http://groups.google.com/groups/opt_out>>>
<https://groups.google.com/____groups/opt_out <
https://groups.google.com/__groups/opt_out>
<https://groups.google.com/__groups/opt_out <
https://groups.google.com/groups/opt_out>>
<https://groups.google.com/____groups/opt_out <
https://groups.google.com/__groups/opt_out>
<https://groups.google.com/__groups/opt_out <
https://groups.google.com/groups/opt_out>>>>.

                  >
                  >     --
                  >     Costin
                  >
                  > --
                  > You received this message because you are

subscribed to the Google Groups "elasticsearch" group.
> To unsubscribe from this group and stop receiving
emails from it, send an email to
>elasticsearc...@googlegroups.____com <mailto:
elasticsearc...@__googlegroups.com
mailto:elasticsearc...@googlegroups.com> <javascript:>.

                  > To view this discussion on the web visit


      >https://groups.google.com/d/____msgid/elasticsearch/

e29e342d-____de74-4ed6-93d4-875fc728c5a5%____40googlegroups.com <
https://groups.google.com/d/__msgid/elasticsearch/e29e342d-
__de74-4ed6-93d4-875fc728c5a5%__40googlegroups.com>

    <https://groups.google.com/d/__msgid/elasticsearch/e29e342d-

__de74-4ed6-93d4-875fc728c5a5%__40googlegroups.com
<https://groups.google.com/d/msgid/elasticsearch/e29e342d-
de74-4ed6-93d4-875fc728c5a5%40googlegroups.com>>

    <https://groups.google.com/d/____msgid/elasticsearch/

e29e342d-____de74-4ed6-93d4-875fc728c5a5%____40googlegroups.com
<https://groups.google.com/d/__msgid/elasticsearch/e29e342d-
__de74-4ed6-93d4-875fc728c5a5%__40googlegroups.com>

    <https://groups.google.com/d/__msgid/elasticsearch/e29e342d-

__de74-4ed6-93d4-875fc728c5a5%__40googlegroups.com
<https://groups.google.com/d/msgid/elasticsearch/e29e342d-
de74-4ed6-93d4-875fc728c5a5%40googlegroups.com>>>.

                  > For more options, visithttps://groups.google.___

_com/groups/opt_out
<http://groups.google.com/__groups/opt_out <
http://groups.google.com/groups/opt_out>>
<https://groups.google.com/____groups/opt_out <
https://groups.google.com/__groups/opt_out>

             <https://groups.google.com/__groups/opt_out <

https://groups.google.com/groups/opt_out>>>.

                  --
                  Costin

             --
             You received this message because you are subscribed to

the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails
from it, send an email to
elasticsearch+unsubscribe@__go__oglegroups.com <
http://googlegroups.com>
<mailto:elasticsearch%2Bunsubscribe@googlegroups.com <mailto:
elasticsearch%252Bunsubscribe@googlegroups.com>
>
<mailto:elasticsearch+____unsubscribe@googlegroups.com
mailto:elasticsearch%2B__unsubscribe@googlegroups.com <mailto:
elasticsearch%2Bunsubscribe@googlegroups.com
mailto:elasticsearch%2Bunsubscribe@googlegroups.com
>>.

             To view this discussion on the web visit
    https://groups.google.com/d/____msgid/elasticsearch/

c1081bf2-____117a-4af2-ba90-2c38a4572782%____40googlegroups.com
<https://groups.google.com/d/__msgid/elasticsearch/c1081bf2-
__117a-4af2-ba90-2c38a4572782%__40googlegroups.com>

    <https://groups.google.com/d/__msgid/elasticsearch/c1081bf2-

__117a-4af2-ba90-2c38a4572782%__40googlegroups.com
<https://groups.google.com/d/msgid/elasticsearch/c1081bf2-
117a-4af2-ba90-2c38a4572782%40googlegroups.com>>

    <https://groups.google.com/d/____msgid/elasticsearch/

c1081bf2-____117a-4af2-ba90-2c38a4572782%_40googlegroups.com?utm
medium=__email&utm_source=__footer
<https://groups.google.com/d/__msgid/elasticsearch/c1081bf2-
__117a-4af2-ba90-2c38a4572782%_40googlegroups.com?utm
medium=__email&utm_source=footer>

    <https://groups.google.com/d/__msgid/elasticsearch/c1081bf2-

__117a-4af2-ba90-2c38a4572782%_40googlegroups.com?utm
medium=__email&utm_source=footer
<https://groups.google.com/d/msgid/elasticsearch/c1081bf2-
117a-4af2-ba90-2c38a4572782%40googlegroups.com?utm_medium=
email&utm_source=footer>>>.

             For more options, visit https://groups.google.com/d/__

__optout https://groups.google.com/d/__optout
<https://groups.google.com/d/__optout <
https://groups.google.com/d/optout>>.

         --
         Costin

         --

         You received this message because you are subscribed to a

topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/S-
BrzwUHJbM/unsubscribe
<https://groups.google.com/d/topic/elasticsearch/S-
BrzwUHJbM/unsubscribe>
<https://groups.google.com/d/topic/elasticsearch/S-
BrzwUHJbM/unsubscribe
<https://groups.google.com/d/topic/elasticsearch/S-
BrzwUHJbM/unsubscribe>>.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALD%2B6GOd-wSu_XQpc_cjCcv_ZgdiwEoJ6BT6VCAkL4ThvOHTAw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Costin Leau) #14

Could you share your setup and configuration on a gist (the more info especially regarding the versions of the stack
used helps)?
Do you use just the input format or also the output format? To clarify - are you using Spark (Map/Reduce) or Shark (and
the relevant
Hive integration in es-hadoop)?

Cheers,

On 5/13/14 10:14 PM, Nick Pentreath wrote:

Ok - well let me know when you're around.

The mapreduce inputformat works fine. I'm using it with Spark to access the ES data via ESInputFormat and run analytics
and machine learning jobs on that data, and the same _ts field works and is the correct data (though it comes through as
org.apache.hadoop.io.Text, which I convert to Long or a DateTime as required).

Perhaps I'm missing it somewhere but is it possible to force a field to be a type? i.e. similar the es.field.mapping
could I tell it that it must parse the field as a string (since then I can take it and do whatever parsing / casting I
want).

I could just use the new Spark SQL module (which I'm seriously considering right now having explored it a bit in the
last few days), but some of the stuff we do requires a SQL Console and JDBC, so having Shark able to just pull in ES
data is definitely very useful...

On Tue, May 13, 2014 at 8:18 PM, Costin Leau <costin.leau@gmail.com mailto:costin.leau@gmail.com> wrote:

Hi Nick,

I'm glad to see you are making progress. This week I'm mainly on the road but maybe we can meet on the IRC next
week, my invitation still stands :)
Timestamp is relatively new type and doesn't handle timezones properly - it is backed by java.sq.Timestamp so it
inherits a lot of its issues.
For some reason the year in your date is rather off so it's worth checking the data read by es-hadoop before passing
it to Hive (see [1]).
I've had issues myself with it and it the moment the cluster is in a different timezone than the dataset itself
things get buggy.
Try using a UDF to do the conversion from the long to a timestamp - I've tried doing something similar in our
conversion but since we don't know the timezones
used, it's easy for things to get mixed.

Cheers,

[1] http://www.elasticsearch.org/__guide/en/elasticsearch/hadoop/__current/troubleshooting.html
<http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/troubleshooting.html>

On 5/13/14 8:25 PM, Nick Pentreath wrote:

    Hi Costin

    Sorry for the silence on this issue. This went a bit quiet.

    But the good news is I've come back to it and managed to get it all working with the new shark 0.9.1 release and
    2.0.0RC1. Actually if I used ADD JAR I got the same exception but when I just put the JAR into the shark lib/
    folder it
    worked fine (which seems to point to the classpath issue you mention).

    However, I seem to have an issue with date <-> timestamp conversion.

    I have a field in ES called "_ts" that has type "date" and the default format "dateOptionalTime". When I do a
    query that
    includes the timestamp it comes back NULL:

    select ts from table ...
    (note I use a correct es.mapping.names to map the _ts field in ES to ts field in Hive/Shark that has timestamp
    type).

    below is some of the debug-level output:

    14/05/13 19:19:47 DEBUG lazy.LazyPrimitive: Data not in the TIMESTAMP data type range so converted to null.
    Given data
    is :96997506-06-30 19:08:168:16.768
    14/05/13 19:19:47 DEBUG lazy.LazyPrimitive: Data not in the TIMESTAMP data type range so converted to null.
    Given data
    is :96997605-06-28 19:08:168:16.768
    14/05/13 19:19:47 DEBUG lazy.LazyPrimitive: Data not in the TIMESTAMP data type range so converted to null.
    Given data
    is :96997624-06-28 19:08:168:16.768
    14/05/13 19:19:47 DEBUG lazy.LazyPrimitive: Data not in the TIMESTAMP data type range so converted to null.
    Given data
    is :96997629-06-28 19:08:168:16.768
    14/05/13 19:19:47 DEBUG lazy.LazyPrimitive: Data not in the TIMESTAMP data type range so converted to null.
    Given data
    is :96997634-06-29 19:08:168:16.768
    NULL
    NULL
    NULL
    NULL
    NULL


    The data that I index in the _ts field is timestamp in ms (long). It doesn't seem to be converted correctly but
    the data
    is correct (in ms at least) and I can query against it using date formats and date math in ES.

    Example snippet from debug log from above:
    ,"_ts":1397130475607}}]}}"


    Any ideas or am I doing something silly?

    I do see that the Hive timestamp expects either seconds since epoch of a string-based format that has nanosecond
    granularity. Is this the issue with just ms long timestamp data?

    Thanks
    Nick


    On Thu, Mar 27, 2014 at 4:50 PM, Costin Leau <costin.leau@gmail.com <mailto:costin.leau@gmail.com>
    <mailto:costin.leau@gmail.com <mailto:costin.leau@gmail.com>>__> wrote:

         Using the latest hive and hadoop is preferred as they contain various bug fixes.
         The error suggests a classpath issue - namely the same class is loaded twice for some reason and hence the
    casting
         fails.

         Let's connect on IRC - give me a ping when you're available (user is costin).

         Cheers,


         On 3/27/14 4:29 PM, Nick Pentreath wrote:

             Thanks for the response.

             I tried latest Shark (cdh4 version of 0.9.1 here http://cloudera.rst.im/shark/ ) - this uses hadoop
    1.0.4 and
             hive 0.11
             I believe, and build elasticsearch-hadoop from github master.

             Still getting same error:
             org.elasticsearch.hadoop.hive.____EsHiveInputFormat$__EsHiveSplit cannot be cast to
             org.elasticsearch.hadoop.hive.____EsHiveInputFormat$__EsHiveSplit

             Will using hive 0.11 / hadoop 1.0.4 vs hive 0.12 / hadoop 1.2.1 in es-hadoop master make a difference?


             Anyone else actually got this working?



             On Thu, Mar 20, 2014 at 2:44 PM, Costin Leau <costin.leau@gmail.com <mailto:costin.leau@gmail.com>
    <mailto:costin.leau@gmail.com <mailto:costin.leau@gmail.com>>
             <mailto:costin.leau@gmail.com <mailto:costin.leau@gmail.com> <mailto:costin.leau@gmail.com
    <mailto:costin.leau@gmail.com>>__>__> wrote:

                  I recommend using master - there are several improvements done in this area. Also using the latest
    Shark
             (0.9.0) and
                  Hive (0.12) will help.


                  On 3/20/14 12:00 PM, Nick Pentreath wrote:

                      Hi

                      I am struggling to get this working too. I'm just trying locally for now, running Shark 0.8.1,
    Hive
             0.9.0 and ES
                      1.0.1
                      with ES-hadoop 1.3.0.M2.

                      I managed to get a basic example working with WRITING into an index. But I'm really after
    READING and
             index.

                      I believe I have set everything up correctly, I've added the jar to Shark:
                      ADD JAR /path/to/es-hadoop.jar;

                      created a table:
                      CREATE EXTERNAL TABLE test_read (name string, price double)

                      STORED BY 'org.elasticsearch.hadoop.______hive.EsStorageHandler'

                      TBLPROPERTIES('es.resource' = 'test_index/test_type/_search?______q=*');



                      And then trying to 'SELECT * FROM test _read' gives me :

                      org.apache.spark.______SparkException: Job aborted: Task 3.0:0 failed more than 0 times;
    aborting job
                      java.lang.ClassCastException:
    org.elasticsearch.hadoop.hive.______EsHiveInputFormat$____ESHiveSplit cannot
             be cast to
                      org.elasticsearch.hadoop.hive.______EsHiveInputFormat$____ESHiveSplit

                      at
    org.apache.spark.scheduler.______DAGScheduler$$anonfun$______abortStage$1.apply(______DAGScheduler.scala:827)

                      at
    org.apache.spark.scheduler.______DAGScheduler$$anonfun$______abortStage$1.apply(______DAGScheduler.scala:825)

                      at scala.collection.mutable.______ResizableArray$class.foreach(______ResizableArray.scala:60)

                      at scala.collection.mutable.______ArrayBuffer.foreach(______ArrayBuffer.scala:47)

                      at org.apache.spark.scheduler.______DAGScheduler.abortStage(______DAGScheduler.scala:825)

                      at org.apache.spark.scheduler.______DAGScheduler.processEvent(______DAGScheduler.scala:440)

                      at org.apache.spark.scheduler.______DAGScheduler.org
                      <http://org.apache.spark.__sch__eduler.DAGScheduler.org <http://scheduler.DAGScheduler.org>
             <http://org.apache.spark.__scheduler.DAGScheduler.org
    <http://org.apache.spark.scheduler.DAGScheduler.org>>>$____apache$spark$__scheduler$____DAGScheduler$$run(______DAGScheduler.scala:502)

                      at org.apache.spark.scheduler.______DAGScheduler$$anon$1.run(______DAGScheduler.scala:157)


                      FAILED: Execution Error, return code -101 from shark.execution.SparkTask


                      In fact I get the same error thrown when trying to READ from the table that I successfully
    WROTE to...

                      On Saturday, 22 February 2014 12:31:21 UTC+2, Costin Leau wrote:

                           Yeah, it might have been some sort of network configuration issue where services where
    running on
             different
                      machines
                           and
                           localhost pointed to a different location.

                           Either way, I'm glad to hear things have are moving forward.

                           Cheers,

                           On 22/02/2014 1:06 AM, Max Lang wrote:
                           > I managed to get it working on ec2 without issue this time. I'd say the biggest
    difference was
             that this
                      time I set up a
                           > dedicated ES machine. Is it possible that, because I was using a cluster with slaves,
    when I used
                      "localhost" the slaves
                           > couldn't find the ES instance running on the master? Or do all the requests go through
    the master?
                           >
                           >
                           > On Wednesday, February 19, 2014 2:35:40 PM UTC-8, Costin Leau wrote:
                           >
                           >     Hi,
                           >
                           >     Setting logging in Hive/Hadoop can be tricky since the log4j needs to be picked up
    by the
             running JVM
                      otherwise you
                           >     won't see anything.
                           >     Take a look at this link on how to tell Hive to use your logging settings [1].
                           >
                           >     For the next release, we might introduce dedicated exceptions for the simple fact
    that some
                      libraries, like Hive,
                           >     swallow the stack trace and it's unclear what the issue is which makes the exception
                      (IllegalStateException) ambiguous.
                           >
                           >     Let me know how it goes and whether you will encounter any issues with Shark. Or if
    you don't :)
                           >
                           >     Thanks!
                           >
                           >

    [1]https://cwiki.apache.org/______confluence/display/Hive/______GettingStarted#GettingStarted-______ErrorLogs
    <https://cwiki.apache.org/____confluence/display/Hive/____GettingStarted#GettingStarted-____ErrorLogs>
             <https://cwiki.apache.org/____confluence/display/Hive/____GettingStarted#GettingStarted-____ErrorLogs
    <https://cwiki.apache.org/__confluence/display/Hive/__GettingStarted#GettingStarted-__ErrorLogs>>

      <https://cwiki.apache.org/____confluence/display/Hive/____GettingStarted#GettingStarted-____ErrorLogs
    <https://cwiki.apache.org/__confluence/display/Hive/__GettingStarted#GettingStarted-__ErrorLogs>
             <https://cwiki.apache.org/__confluence/display/Hive/__GettingStarted#GettingStarted-__ErrorLogs
    <https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-ErrorLogs>>>


    <https://cwiki.apache.org/______confluence/display/Hive/______GettingStarted#GettingStarted-______ErrorLogs
    <https://cwiki.apache.org/____confluence/display/Hive/____GettingStarted#GettingStarted-____ErrorLogs>
             <https://cwiki.apache.org/____confluence/display/Hive/____GettingStarted#GettingStarted-____ErrorLogs
    <https://cwiki.apache.org/__confluence/display/Hive/__GettingStarted#GettingStarted-__ErrorLogs>>

      <https://cwiki.apache.org/____confluence/display/Hive/____GettingStarted#GettingStarted-____ErrorLogs
    <https://cwiki.apache.org/__confluence/display/Hive/__GettingStarted#GettingStarted-__ErrorLogs>
             <https://cwiki.apache.org/__confluence/display/Hive/__GettingStarted#GettingStarted-__ErrorLogs
    <https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-ErrorLogs>>>>
                           >

    <https://cwiki.apache.org/______confluence/display/Hive/______GettingStarted#GettingStarted-______ErrorLogs
    <https://cwiki.apache.org/____confluence/display/Hive/____GettingStarted#GettingStarted-____ErrorLogs>
             <https://cwiki.apache.org/____confluence/display/Hive/____GettingStarted#GettingStarted-____ErrorLogs
    <https://cwiki.apache.org/__confluence/display/Hive/__GettingStarted#GettingStarted-__ErrorLogs>>

      <https://cwiki.apache.org/____confluence/display/Hive/____GettingStarted#GettingStarted-____ErrorLogs
    <https://cwiki.apache.org/__confluence/display/Hive/__GettingStarted#GettingStarted-__ErrorLogs>
             <https://cwiki.apache.org/__confluence/display/Hive/__GettingStarted#GettingStarted-__ErrorLogs
    <https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-ErrorLogs>>>


    <https://cwiki.apache.org/______confluence/display/Hive/______GettingStarted#GettingStarted-______ErrorLogs
    <https://cwiki.apache.org/____confluence/display/Hive/____GettingStarted#GettingStarted-____ErrorLogs>
             <https://cwiki.apache.org/____confluence/display/Hive/____GettingStarted#GettingStarted-____ErrorLogs
    <https://cwiki.apache.org/__confluence/display/Hive/__GettingStarted#GettingStarted-__ErrorLogs>>


      <https://cwiki.apache.org/____confluence/display/Hive/____GettingStarted#GettingStarted-____ErrorLogs
    <https://cwiki.apache.org/__confluence/display/Hive/__GettingStarted#GettingStarted-__ErrorLogs>
             <https://cwiki.apache.org/__confluence/display/Hive/__GettingStarted#GettingStarted-__ErrorLogs
    <https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-ErrorLogs>>>>>
                           >
                           >     On 20/02/2014 12:02 AM, Max Lang wrote:
                           >     > Hey Costin,
                           >     >
                           >     > Thanks for the swift reply. I abandoned EC2 to take that out of the equation and
    managed
             to get
                      everything working
                           >     > locally using the latest version of everything (though I realized just now I'm
    still on
             hive 0.9).
                      I'm guessing you're
                           >     > right about some port connection issue because I definitely had ES running on
    that machine.
                           >     >
                           >     > I changed hive-log4j.properties and added
                           >     > |
                           >     > #custom logging levels
                           >     > #log4j.logger.xxx=DEBUG
                           >     > log4j.logger.org <http://log4j.logger.org> <http://log4j.logger.org>
             <http://log4j.logger.org>.______elasticsearch.hadoop.rest=______TRACE
                           >     >log4j.logger.org.__elasticsea____rch.hadoop.mr <http://elasticsea__rch.hadoop.mr>
    <http://elasticsearch.hadoop.__mr <http://elasticsearch.hadoop.mr>>
             <http://log4j.logger.org.__ela__sticsearch.hadoop.mr <http://elasticsearch.hadoop.mr>
    <http://log4j.logger.org.__elasticsearch.hadoop.mr <http://log4j.logger.org.elasticsearch.hadoop.mr>>>
                      <http://log4j.logger.org.__ela____sticsearch.hadoop.mr <http://ela__sticsearch.hadoop.mr>
    <http://elasticsearch.hadoop.__mr <http://elasticsearch.hadoop.mr>>
             <http://log4j.logger.org.__ela__sticsearch.hadoop.mr <http://elasticsearch.hadoop.mr>
    <http://log4j.logger.org.__elasticsearch.hadoop.mr <http://log4j.logger.org.elasticsearch.hadoop.mr>>>>
                           <http://log4j.logger.org.__ela____sticsearch.hadoop.mr <http://ela__sticsearch.hadoop.mr>
    <http://elasticsearch.hadoop.__mr <http://elasticsearch.hadoop.mr>>
             <http://log4j.logger.org.__ela__sticsearch.hadoop.mr <http://elasticsearch.hadoop.mr>
    <http://log4j.logger.org.__elasticsearch.hadoop.mr <http://log4j.logger.org.elasticsearch.hadoop.mr>>>
                      <http://log4j.logger.org.__ela____sticsearch.hadoop.mr <http://ela__sticsearch.hadoop.mr>
    <http://elasticsearch.hadoop.__mr <http://elasticsearch.hadoop.mr>>
             <http://log4j.logger.org.__ela__sticsearch.hadoop.mr <http://elasticsearch.hadoop.mr>
    <http://log4j.logger.org.__elasticsearch.hadoop.mr <http://log4j.logger.org.elasticsearch.hadoop.mr>>>>>=______TRACE


                           >     > |
                           >     >
                           >     > But I didn't see any trace logging. Hopefully I can get it working on EC2 without
    issue,
             but, for
                      the future, is this
                           >     > the correct way to set TRACE logging?
                           >     >
                           >     > Oh and, for reference, I tried running without ES up and I got the following,
    exceptions:
                           >     >
                           >     > 2014-02-19 13:46:08,803 ERROR shark.SharkDriver (Logging.scala:logError(64)) -
    FAILED: Hive
                      Internal Error:
                           >     > java.lang.______IllegalStateException(Cannot discover Elasticsearch version)
                           >     > java.lang.______IllegalStateException: Cannot discover Elasticsearch version
                           >     > at
    org.elasticsearch.hadoop.hive.______EsStorageHandler.init(______EsStorageHandler.java:101)
                           >     > at


    org.elasticsearch.hadoop.hive.______EsStorageHandler.______configureOutputJobProperties(______EsStorageHandler.java:83)
                           >     > at


    org.apache.hadoop.hive.ql.______plan.PlanUtils.______configureJobPropertiesForStora______geHandler(PlanUtils.java:____706)
                           >     > at


    org.apache.hadoop.hive.ql.______plan.PlanUtils.______configureOutputJobPropertiesFo______rStorageHandler(PlanUtils.______java:675)
                           >     > at
             org.apache.hadoop.hive.ql.______exec.FileSinkOperator.______augmentPlan(FileSinkOperator.______java:764)
                           >     > at

    org.apache.hadoop.hive.ql.______parse.SemanticAnalyzer.______putOpInsertMap(______SemanticAnalyzer.java:1518)
                           >     > at

    org.apache.hadoop.hive.ql.______parse.SemanticAnalyzer.______genFileSinkPlan(______SemanticAnalyzer.java:4337)
                           >     > at


    org.apache.hadoop.hive.ql.______parse.SemanticAnalyzer.______genPostGroupByBodyPlan(______SemanticAnalyzer.java:6207)
                           >     > at
             org.apache.hadoop.hive.ql.______parse.SemanticAnalyzer.______genBodyPlan(SemanticAnalyzer.______java:6138)
                           >     > at
             org.apache.hadoop.hive.ql.______parse.SemanticAnalyzer.______genPlan(SemanticAnalyzer.java:______6764)
                           >     > at
             shark.parse.______SharkSemanticAnalyzer.______analyzeInternal(______SharkSemanticAnalyzer.scala:______149)
                           >     > at

    org.apache.hadoop.hive.ql.______parse.BaseSemanticAnalyzer.______analyze(BaseSemanticAnalyzer.______java:244)
                           >     > at shark.SharkDriver.compile(______SharkDriver.scala:215)
                           >     > at org.apache.hadoop.hive.ql.______Driver.compile(Driver.java:______336)
                           >     > at org.apache.hadoop.hive.ql.______Driver.run(Driver.java:895)
                           >     > at shark.SharkCliDriver.______processCmd(SharkCliDriver.______scala:324)
                           >     > at org.apache.hadoop.hive.cli.______CliDriver.processLine(______CliDriver.java:406)
                           >     > at shark.SharkCliDriver$.main(______SharkCliDriver.scala:232)
                           >     > at shark.SharkCliDriver.main(______SharkCliDriver.scala)

                           >     > Caused by: java.io.IOException: Out of nodes and retries; caught exception
                           >     > at
    org.elasticsearch.hadoop.rest.______NetworkClient.execute(______NetworkClient.java:81)
                           >     > at org.elasticsearch.hadoop.rest.______RestClient.execute(____RestClient.__java:221)
                           >     > at org.elasticsearch.hadoop.rest.______RestClient.execute(____RestClient.__java:205)
                           >     > at org.elasticsearch.hadoop.rest.______RestClient.execute(____RestClient.__java:209)
                           >     > at org.elasticsearch.hadoop.rest.______RestClient.get(RestClient.______java:103)
                           >     > at
    org.elasticsearch.hadoop.rest.______RestClient.esVersion(______RestClient.java:274)
                           >     > at


    org.elasticsearch.hadoop.rest.______InitializationUtils.______discoverEsVersion(______InitializationUtils.java:84)
                           >     > at
    org.elasticsearch.hadoop.hive.______EsStorageHandler.init(______EsStorageHandler.java:99)

                           >     > ... 18 more
                           >     > Caused by: java.net.ConnectException: Connection refused
                           >     > at java.net.PlainSocketImpl.______socketConnect(Native Method)
                           >     > at java.net <http://java.net> <http://java.net>

      <http://java.net>.______AbstractPlainSocketImpl.______doConnect(______AbstractPlainSocketImpl.java:______339)
                           >     > at java.net <http://java.net> <http://java.net>


    <http://java.net>.______AbstractPlainSocketImpl.______connectToAddress(______AbstractPlainSocketImpl.java:______200)
                           >     > at java.net <http://java.net> <http://java.net>
             <http://java.net>.______AbstractPlainSocketImpl.______connect(______AbstractPlainSocketImpl.java:______182)
                           >     > at java.net.SocksSocketImpl.______connect(SocksSocketImpl.java:______391)
                           >     > at java.net.Socket.connect(______Socket.java:579)
                           >     > at java.net.Socket.connect(______Socket.java:528)
                           >     > at java.net.Socket.<init>(Socket.______java:425)
                           >     > at java.net.Socket.<init>(Socket.______java:280)
                           >     > at


    org.apache.commons.httpclient.______protocol.______DefaultProtocolSocketFactory.______createSocket(______DefaultProtocolSocketFactory.______java:80)
                           >     > at


    org.apache.commons.httpclient.______protocol.______DefaultProtocolSocketFactory.______createSocket(______DefaultProtocolSocketFactory.______java:122)
                           >     > at
    org.apache.commons.httpclient.______HttpConnection.open(______HttpConnection.java:707)
                           >     > at

    org.apache.commons.httpclient.______HttpMethodDirector.______executeWithRetry(______HttpMethodDirector.java:387)
                           >     > at

    org.apache.commons.httpclient.______HttpMethodDirector.______executeMethod(______HttpMethodDirector.java:171)
                           >     > at
    org.apache.commons.httpclient.______HttpClient.executeMethod(______HttpClient.java:397)
                           >     > at
    org.apache.commons.httpclient.______HttpClient.executeMethod(______HttpClient.java:323)
                           >     > at


    org.elasticsearch.hadoop.rest.______commonshttp.______CommonsHttpTransport.execute(______CommonsHttpTransport.java:____160)
                           >     > at
    org.elasticsearch.hadoop.rest.______NetworkClient.execute(______NetworkClient.java:74)

                           >     > ... 25 more
                           >     >
                           >     > Let me know if there's anything in particular you'd like me to try on EC2.
                           >     >
                           >     > (For posterity, the versions I used were: hadoop 2.2.0, hive 0.9.0, shark 8.1,
    spark 8.1,
             es-hadoop
                      1.3.0.M2, java
                           >     > 1.7.0_15, scala 2.9.3, elasticsearch 1.0.0)
                           >     >
                           >     > Thanks again,
                           >     > Max
                           >     >
                           >     > On Tuesday, February 18, 2014 10:16:38 PM UTC-8, Costin Leau wrote:
                           >     >
                           >     >     The error indicates a network error - namely es-hadoop cannot connect to
    Elasticsearch
             on the
                      default (localhost:9200)
                           >     >     HTTP port. Can you double check whether that's indeed the case (using curl or
    even
             telnet on
                      that port) - maybe the
                           >     >     firewall prevents any connections to be made...
                           >     >     Also you could try using the latest Hive, 0.12 and a more recent Hadoop such
    as 1.1.2
             or 1.2.1.
                           >     >
                           >     >     Additionally, can you enable TRACE logging in your job on es-hadoop packages
                      org.elasticsearch.hadoop.rest and
                           >     >org.elasticsearch.hadoop.mr <http://org.elasticsearch.hadoop.mr>
    <http://org.elasticsearch.__hadoop.mr <http://org.elasticsearch.hadoop.mr>>
             <http://org.elasticsearch.__ha__doop.mr <http://hadoop.mr> <http://org.elasticsearch.__hadoop.mr
    <http://org.elasticsearch.hadoop.mr>>>
                      <http://org.elasticsearch.__ha____doop.mr <http://ha__doop.mr> <http://hadoop.mr>
    <http://org.elasticsearch.__ha__doop.mr <http://hadoop.mr>
             <http://org.elasticsearch.__hadoop.mr <http://org.elasticsearch.hadoop.mr>>>>
                      <http://org.elasticsearch.__ha____doop.mr <http://ha__doop.mr> <http://hadoop.mr>
    <http://org.elasticsearch.__ha__doop.mr <http://hadoop.mr>
             <http://org.elasticsearch.__hadoop.mr <http://org.elasticsearch.hadoop.mr>>>
                           <http://org.elasticsearch.__ha____doop.mr <http://ha__doop.mr> <http://hadoop.mr>
    <http://org.elasticsearch.__ha__doop.mr <http://hadoop.mr>
             <http://org.elasticsearch.__hadoop.mr <http://org.elasticsearch.hadoop.mr>>>>>
                      <http://org.elasticsearch.__ha____doop.mr <http://ha__doop.mr> <http://hadoop.mr>
    <http://org.elasticsearch.__ha__doop.mr <http://hadoop.mr>
             <http://org.elasticsearch.__hadoop.mr <http://org.elasticsearch.hadoop.mr>>>
    <http://org.elasticsearch.__ha____doop.mr <http://ha__doop.mr> <http://hadoop.mr>
                      <http://org.elasticsearch.__ha__doop.mr <http://hadoop.mr>
    <http://org.elasticsearch.__hadoop.mr <http://org.elasticsearch.hadoop.mr>>>>

                           >     <http://org.elasticsearch.__ha____doop.mr <http://ha__doop.mr> <http://hadoop.mr>
             <http://org.elasticsearch.__ha__doop.mr <http://hadoop.mr> <http://org.elasticsearch.__hadoop.mr
    <http://org.elasticsearch.hadoop.mr>>>
                      <http://org.elasticsearch.__ha____doop.mr <http://ha__doop.mr> <http://hadoop.mr>
    <http://org.elasticsearch.__ha__doop.mr <http://hadoop.mr>
             <http://org.elasticsearch.__hadoop.mr <http://org.elasticsearch.hadoop.mr>>>>>> packages and report back ?

                           >     >
                           >     >     Thanks,
                           >     >
                           >     >     On 19/02/2014 4:03 AM, Max Lang wrote:
                           >     >     > I set everything up using this
                      guide:https://github.com/______amplab/shark/wiki/Running-______Shark-on-EC2
    <https://github.com/____amplab/shark/wiki/Running-____Shark-on-EC2>
             <https://github.com/__amplab/__shark/wiki/Running-__Shark-on-__EC2
    <https://github.com/__amplab/shark/wiki/Running-__Shark-on-EC2>>
                      <https://github.com/amplab/____shark/wiki/Running-Shark-on-____EC2
    <https://github.com/amplab/__shark/wiki/Running-Shark-on-__EC2>
             <https://github.com/amplab/__shark/wiki/Running-Shark-on-__EC2
    <https://github.com/amplab/shark/wiki/Running-Shark-on-EC2>>>
                           <https://github.com/amplab/______shark/wiki/Running-Shark-on-______EC2
    <https://github.com/amplab/____shark/wiki/Running-Shark-on-____EC2>
             <https://github.com/amplab/____shark/wiki/Running-Shark-on-____EC2
    <https://github.com/amplab/__shark/wiki/Running-Shark-on-__EC2>>
                      <https://github.com/amplab/____shark/wiki/Running-Shark-on-____EC2
    <https://github.com/amplab/__shark/wiki/Running-Shark-on-__EC2>
             <https://github.com/amplab/__shark/wiki/Running-Shark-on-__EC2
    <https://github.com/amplab/shark/wiki/Running-Shark-on-EC2>>>>
                           <https://github.com/amplab/______shark/wiki/Running-Shark-on-______EC2
    <https://github.com/amplab/____shark/wiki/Running-Shark-on-____EC2>
             <https://github.com/amplab/____shark/wiki/Running-Shark-on-____EC2
    <https://github.com/amplab/__shark/wiki/Running-Shark-on-__EC2>>
                      <https://github.com/amplab/____shark/wiki/Running-Shark-on-____EC2
    <https://github.com/amplab/__shark/wiki/Running-Shark-on-__EC2>
             <https://github.com/amplab/__shark/wiki/Running-Shark-on-__EC2
    <https://github.com/amplab/shark/wiki/Running-Shark-on-EC2>>>
                      <https://github.com/amplab/______shark/wiki/Running-Shark-on-______EC2
    <https://github.com/amplab/____shark/wiki/Running-Shark-on-____EC2>
             <https://github.com/amplab/____shark/wiki/Running-Shark-on-____EC2
    <https://github.com/amplab/__shark/wiki/Running-Shark-on-__EC2>>
                      <https://github.com/amplab/____shark/wiki/Running-Shark-on-____EC2
    <https://github.com/amplab/__shark/wiki/Running-Shark-on-__EC2>
             <https://github.com/amplab/__shark/wiki/Running-Shark-on-__EC2
    <https://github.com/amplab/shark/wiki/Running-Shark-on-EC2>>>>>
                           >     >     <https://github.com/amplab/______shark/wiki/Running-Shark-on-______EC2
    <https://github.com/amplab/____shark/wiki/Running-Shark-on-____EC2>
             <https://github.com/amplab/____shark/wiki/Running-Shark-on-____EC2
    <https://github.com/amplab/__shark/wiki/Running-Shark-on-__EC2>>
                      <https://github.com/amplab/____shark/wiki/Running-Shark-on-____EC2
    <https://github.com/amplab/__shark/wiki/Running-Shark-on-__EC2>
             <https://github.com/amplab/__shark/wiki/Running-Shark-on-__EC2
    <https://github.com/amplab/shark/wiki/Running-Shark-on-EC2>>>
                      <https://github.com/amplab/______shark/wiki/Running-Shark-on-______EC2
    <https://github.com/amplab/____shark/wiki/Running-Shark-on-____EC2>
             <https://github.com/amplab/____shark/wiki/Running-Shark-on-____EC2
    <https://github.com/amplab/__shark/wiki/Running-Shark-on-__EC2>>
                      <https://github.com/amplab/____shark/wiki/Running-Shark-on-____EC2
    <https://github.com/amplab/__shark/wiki/Running-Shark-on-__EC2>
             <https://github.com/amplab/__shark/wiki/Running-Shark-on-__EC2
    <https://github.com/amplab/shark/wiki/Running-Shark-on-EC2>>>>
                           >     <https://github.com/amplab/______shark/wiki/Running-Shark-on-______EC2
    <https://github.com/amplab/____shark/wiki/Running-Shark-on-____EC2>
             <https://github.com/amplab/____shark/wiki/Running-Shark-on-____EC2
    <https://github.com/amplab/__shark/wiki/Running-Shark-on-__EC2>>
                      <https://github.com/amplab/____shark/wiki/Running-Shark-on-____EC2
    <https://github.com/amplab/__shark/wiki/Running-Shark-on-__EC2>
             <https://github.com/amplab/__shark/wiki/Running-Shark-on-__EC2
    <https://github.com/amplab/shark/wiki/Running-Shark-on-EC2>>>
                           <https://github.com/amplab/______shark/wiki/Running-Shark-on-______EC2
    <https://github.com/amplab/____shark/wiki/Running-Shark-on-____EC2>
             <https://github.com/amplab/____shark/wiki/Running-Shark-on-____EC2
    <https://github.com/amplab/__shark/wiki/Running-Shark-on-__EC2>>

                      <https://github.com/amplab/____shark/wiki/Running-Shark-on-____EC2
    <https://github.com/amplab/__shark/wiki/Running-Shark-on-__EC2>
             <https://github.com/amplab/__shark/wiki/Running-Shark-on-__EC2
    <https://github.com/amplab/shark/wiki/Running-Shark-on-EC2>>>>>> on an ec2 cluster. I've
                           >     >     > copied the elasticsearch-hadoop jars into the hive lib directory and I have
             elasticsearch
                      running on localhost:9200. I'm
                           >     >     > running shark in a screen session with --service screenserver and
    connecting to it
             at the
                      same time using shark -h
                           >     >     > localhost.
                           >     >     >
                           >     >     > Unfortunately, when I attempt to write data into elasticsearch, it fails.
    Here's an
             example:
                           >     >     >
                           >     >     > |
                           >     >     > [localhost:10000]shark>CREATE EXTERNAL TABLE wiki (id BIGINT,title
    STRING,last_modified
                      STRING,xml STRING,text
                           >     >     > STRING)ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'LOCATION
                      's3n://spark-data/wikipedia-______sample/';

                           >     >     > Timetaken (including network latency):0.159seconds
                           >     >     > 14/02/1901:23:33INFO CliDriver:Timetaken (including network
    latency):0.159seconds
                           >     >     >
                           >     >     > [localhost:10000]shark>SELECT title FROM wiki LIMIT 1;
                           >     >     > Alpokalja
                           >     >     > Timetaken (including network latency):2.23seconds
                           >     >     > 14/02/1901:23:48INFO CliDriver:Timetaken (including network
    latency):2.23seconds
                           >     >     >
                           >     >     > [localhost:10000]shark>CREATE EXTERNAL TABLE es_wiki (id BIGINT,title
             STRING,last_modified
                      STRING,xml STRING,text
                           >     >     > STRING)STORED BY


    'org.elasticsearch.hadoop.______hive.EsStorageHandler'______TBLPROPERTIES('es.resource'='______wikipedia/article');

                           >     >     > Timetaken (including network latency):0.061seconds
                           >     >     > 14/02/1901:33:51INFO CliDriver:Timetaken (including network
    latency):0.061seconds
                           >     >     >
                           >     >     > [localhost:10000]shark>INSERT OVERWRITE TABLE es_wiki SELECTw.id
                      <http://w.id>,w.title,w.last_______modified,w.xml,w.text FROM wiki w;
                           >     >     > [HiveError]:Queryreturned non-zero
    code:9,cause:FAILED:______ExecutionError,returncode
                      -101fromshark.execution.______SparkTask

                           >     >     > Timetaken (including network latency):3.575seconds
                           >     >     > 14/02/1901:34:42INFO CliDriver:Timetaken (including network
    latency):3.575seconds
                           >     >     > |
                           >     >     >
                           >     >     > *The stack trace looks like this:*
                           >     >     >
                           >     >     > org.apache.hadoop.hive.ql.______metadata.HiveException
                      (org.apache.hadoop.hive.ql.______metadata.HiveException: java.io.IOException:

                           >     >     > Out of nodes and retries; caught exception)
                           >     >     >
                           >     >     >


    org.apache.hadoop.hive.ql.______exec.FileSinkOperator.______processOp(FileSinkOperator.______java:602)shark.execution.______FileSinkOperator$$anonfun$______processPartition$1.apply(______FileSinkOperator.scala:84)______shark.execution.______FileSinkOperator$$anonfun$______processPartition$1.apply(______FileSinkOperator.scala:81)______scala.collection.Iterator$______class.foreach(Iterator.scala:______772)scala.collection.____Iterator$__$anon$19.foreach(____Iterator.__scala:399)shark.____execution.__FileSinkOperator.______processPartition(______FileSinkOperator.scala:81)______shark.execution.______FileSinkOperator$.writeFiles$______1(FileSinkOperator.scala:__207)____shark.execution.______FileSinkOperator$$anonfun$______executeProcessFileSinkPartitio______n$1.apply(__FileSinkOperator.____scala:__211)shark.execution.______FileSinkOperator$$anonfun$______executeProcessFileSinkPartitio______n$1.apply(__FileSinkOperator.____scala:__211)org.apache.spark.______scheduler.ResultTask.

runTask(______ResultTask.scala:107)org.______apache.spark.scheduler.Ta

sk.____run(Task.scala:53)org.__apache.____spark.executor.__Executor$____Task


         Runner$$anonfun$run$1.__apply$____mcV$sp(Executor.scala:__215)____org.apac


                  he.spa


                           rk.dep
                           >
                           >     loy.Sp
                           >     >
                           >     >


    arkHadoopUtil.runAsUser(______SparkHadoopUtil.scala:50)org.______apache.spark.executor.______Executor$TaskRunner.run(______Executor.scala:182)java.util.______concurrent.____ThreadPoolExecutor.______runWorker(ThreadPoolExecutor.______java:1145)java.util.______concurrent.ThreadPoolExecutor$______Worker.run(____ThreadPoolExecutor.__java:615)____java.lang.Thread.run(____Thread.__java:744


                           >
                           >     >
                           >     >     > I should be using Hive 0.9.0, shark 0.8.1, elasticsearch 1.0.0, Hadoop
    1.0.4, and
             java 1.7.0_51
                           >     >     > Based on my cursory look at the hadoop and elasticsearch-hadoop sources, it
    looks
             like hive
                      is just rethrowing an
                           >     >     > IOException it's getting from Spark, and elasticsearch-hadoop is just
    hitting those
             exceptions.
                           >     >     > I suppose my questions are: Does this look like an issue with my
    ES/elasticsearch-hadoop
                      config? And has anyone gotten
                           >     >     > elasticsearch working with Spark/Shark?
                           >     >     > Any ideas/insights are appreciated.
                           >     >     > Thanks,Max
                           >     >     >
                           >     >     > --
                           >     >     > You received this message because you are subscribed to the Google Groups
             "elasticsearch" group.
                           >     >     > To unsubscribe from this group and stop receiving emails from it, send an
    email to
                           >     >     >elasticsearc...@googlegroups.______com <mailto:elasticsearc...@
    <mailto:elasticsearc...@>__goog__legroups.com <http://googlegroups.com>
             <mailto:elasticsearc...@__googlegroups.com <mailto:elasticsearc...@googlegroups.com>>> <javascript:>.

                           >     >     > To view this discussion on the web visit
                           >     >


     >https://groups.google.com/d/______msgid/elasticsearch/__9486faff-____3eaf-4344-8931-__3121bbc5d9c7%______40googlegroups.com <https://groups.google.com/d/____msgid/elasticsearch/9486faff-____3eaf-4344-8931-3121bbc5d9c7%____40googlegroups.com> <https://groups.google.com/d/____msgid/elasticsearch/9486faff-____3eaf-4344-8931-3121bbc5d9c7%____40googlegroups.com <https://groups.google.com/d/__msgid/elasticsearch/9486faff-__3eaf-4344-8931-3121bbc5d9c7%__40googlegroups.com>>


    <https://groups.google.com/d/____msgid/elasticsearch/9486faff-____3eaf-4344-8931-3121bbc5d9c7%____40googlegroups.com
    <https://groups.google.com/d/__msgid/elasticsearch/9486faff-__3eaf-4344-8931-3121bbc5d9c7%__40googlegroups.com>

    <https://groups.google.com/d/__msgid/elasticsearch/9486faff-__3eaf-4344-8931-3121bbc5d9c7%__40googlegroups.com
    <https://groups.google.com/d/msgid/elasticsearch/9486faff-3eaf-4344-8931-3121bbc5d9c7%40googlegroups.com>>>



    <https://groups.google.com/d/______msgid/elasticsearch/__9486faff-____3eaf-4344-8931-__3121bbc5d9c7%______40googlegroups.com
    <https://groups.google.com/d/____msgid/elasticsearch/9486faff-____3eaf-4344-8931-3121bbc5d9c7%____40googlegroups.com>

    <https://groups.google.com/d/____msgid/elasticsearch/9486faff-____3eaf-4344-8931-3121bbc5d9c7%____40googlegroups.com
    <https://groups.google.com/d/__msgid/elasticsearch/9486faff-__3eaf-4344-8931-3121bbc5d9c7%__40googlegroups.com>>


    <https://groups.google.com/d/____msgid/elasticsearch/9486faff-____3eaf-4344-8931-3121bbc5d9c7%____40googlegroups.com
    <https://groups.google.com/d/__msgid/elasticsearch/9486faff-__3eaf-4344-8931-3121bbc5d9c7%__40googlegroups.com>

    <https://groups.google.com/d/__msgid/elasticsearch/9486faff-__3eaf-4344-8931-3121bbc5d9c7%__40googlegroups.com
    <https://groups.google.com/d/msgid/elasticsearch/9486faff-3eaf-4344-8931-3121bbc5d9c7%40googlegroups.com>>>>
                           >


    <https://groups.google.com/d/______msgid/elasticsearch/__9486faff-____3eaf-4344-8931-__3121bbc5d9c7%______40googlegroups.com
    <https://groups.google.com/d/____msgid/elasticsearch/9486faff-____3eaf-4344-8931-3121bbc5d9c7%____40googlegroups.com>

    <https://groups.google.com/d/____msgid/elasticsearch/9486faff-____3eaf-4344-8931-3121bbc5d9c7%____40googlegroups.com
    <https://groups.google.com/d/__msgid/elasticsearch/9486faff-__3eaf-4344-8931-3121bbc5d9c7%__40googlegroups.com>>


    <https://groups.google.com/d/____msgid/elasticsearch/9486faff-____3eaf-4344-8931-3121bbc5d9c7%____40googlegroups.com
    <https://groups.google.com/d/__msgid/elasticsearch/9486faff-__3eaf-4344-8931-3121bbc5d9c7%__40googlegroups.com>

    <https://groups.google.com/d/__msgid/elasticsearch/9486faff-__3eaf-4344-8931-3121bbc5d9c7%__40googlegroups.com
    <https://groups.google.com/d/msgid/elasticsearch/9486faff-3eaf-4344-8931-3121bbc5d9c7%40googlegroups.com>>>



    <https://groups.google.com/d/______msgid/elasticsearch/__9486faff-____3eaf-4344-8931-__3121bbc5d9c7%______40googlegroups.com
    <https://groups.google.com/d/____msgid/elasticsearch/9486faff-____3eaf-4344-8931-3121bbc5d9c7%____40googlegroups.com>

    <https://groups.google.com/d/____msgid/elasticsearch/9486faff-____3eaf-4344-8931-3121bbc5d9c7%____40googlegroups.com
    <https://groups.google.com/d/__msgid/elasticsearch/9486faff-__3eaf-4344-8931-3121bbc5d9c7%__40googlegroups.com>>


    <https://groups.google.com/d/____msgid/elasticsearch/9486faff-____3eaf-4344-8931-3121bbc5d9c7%____40googlegroups.com
    <https://groups.google.com/d/__msgid/elasticsearch/9486faff-__3eaf-4344-8931-3121bbc5d9c7%__40googlegroups.com>

    <https://groups.google.com/d/__msgid/elasticsearch/9486faff-__3eaf-4344-8931-3121bbc5d9c7%__40googlegroups.com
    <https://groups.google.com/d/msgid/elasticsearch/9486faff-3eaf-4344-8931-3121bbc5d9c7%40googlegroups.com>>>>>
                           >     >


    <https://groups.google.com/d/______msgid/elasticsearch/__9486faff-____3eaf-4344-8931-__3121bbc5d9c7%______40googlegroups.com
    <https://groups.google.com/d/____msgid/elasticsearch/9486faff-____3eaf-4344-8931-3121bbc5d9c7%____40googlegroups.com>

    <https://groups.google.com/d/____msgid/elasticsearch/9486faff-____3eaf-4344-8931-3121bbc5d9c7%____40googlegroups.com
    <https://groups.google.com/d/__msgid/elasticsearch/9486faff-__3eaf-4344-8931-3121bbc5d9c7%__40googlegroups.com>>


    <https://groups.google.com/d/____msgid/elasticsearch/9486faff-____3eaf-4344-8931-3121bbc5d9c7%____40googlegroups.com
    <https://groups.google.com/d/__msgid/elasticsearch/9486faff-__3eaf-4344-8931-3121bbc5d9c7%__40googlegroups.com>

    <https://groups.google.com/d/__msgid/elasticsearch/9486faff-__3eaf-4344-8931-3121bbc5d9c7%__40googlegroups.com
    <https://groups.google.com/d/msgid/elasticsearch/9486faff-3eaf-4344-8931-3121bbc5d9c7%40googlegroups.com>>>



    <https://groups.google.com/d/______msgid/elasticsearch/__9486faff-____3eaf-4344-8931-__3121bbc5d9c7%______40googlegroups.com
    <https://groups.google.com/d/____msgid/elasticsearch/9486faff-____3eaf-4344-8931-3121bbc5d9c7%____40googlegroups.com>

    <https://groups.google.com/d/____msgid/elasticsearch/9486faff-____3eaf-4344-8931-3121bbc5d9c7%____40googlegroups.com
    <https://groups.google.com/d/__msgid/elasticsearch/9486faff-__3eaf-4344-8931-3121bbc5d9c7%__40googlegroups.com>>


    <https://groups.google.com/d/____msgid/elasticsearch/9486faff-____3eaf-4344-8931-3121bbc5d9c7%____40googlegroups.com
    <https://groups.google.com/d/__msgid/elasticsearch/9486faff-__3eaf-4344-8931-3121bbc5d9c7%__40googlegroups.com>

    <https://groups.google.com/d/__msgid/elasticsearch/9486faff-__3eaf-4344-8931-3121bbc5d9c7%__40googlegroups.com
    <https://groups.google.com/d/msgid/elasticsearch/9486faff-3eaf-4344-8931-3121bbc5d9c7%40googlegroups.com>>>>
                           >


    <https://groups.google.com/d/______msgid/elasticsearch/__9486faff-____3eaf-4344-8931-__3121bbc5d9c7%______40googlegroups.com
    <https://groups.google.com/d/____msgid/elasticsearch/9486faff-____3eaf-4344-8931-3121bbc5d9c7%____40googlegroups.com>

    <https://groups.google.com/d/____msgid/elasticsearch/9486faff-____3eaf-4344-8931-3121bbc5d9c7%____40googlegroups.com
    <https://groups.google.com/d/__msgid/elasticsearch/9486faff-__3eaf-4344-8931-3121bbc5d9c7%__40googlegroups.com>>


    <https://groups.google.com/d/____msgid/elasticsearch/9486faff-____3eaf-4344-8931-3121bbc5d9c7%____40googlegroups.com
    <https://groups.google.com/d/__msgid/elasticsearch/9486faff-__3eaf-4344-8931-3121bbc5d9c7%__40googlegroups.com>

    <https://groups.google.com/d/__msgid/elasticsearch/9486faff-__3eaf-4344-8931-3121bbc5d9c7%__40googlegroups.com
    <https://groups.google.com/d/msgid/elasticsearch/9486faff-3eaf-4344-8931-3121bbc5d9c7%40googlegroups.com>>>



    <https://groups.google.com/d/______msgid/elasticsearch/__9486faff-____3eaf-4344-8931-__3121bbc5d9c7%______40googlegroups.com
    <https://groups.google.com/d/____msgid/elasticsearch/9486faff-____3eaf-4344-8931-3121bbc5d9c7%____40googlegroups.com>

    <https://groups.google.com/d/____msgid/elasticsearch/9486faff-____3eaf-4344-8931-3121bbc5d9c7%____40googlegroups.com
    <https://groups.google.com/d/__msgid/elasticsearch/9486faff-__3eaf-4344-8931-3121bbc5d9c7%__40googlegroups.com>>


    <https://groups.google.com/d/____msgid/elasticsearch/9486faff-____3eaf-4344-8931-3121bbc5d9c7%____40googlegroups.com
    <https://groups.google.com/d/__msgid/elasticsearch/9486faff-__3eaf-4344-8931-3121bbc5d9c7%__40googlegroups.com>

    <https://groups.google.com/d/__msgid/elasticsearch/9486faff-__3eaf-4344-8931-3121bbc5d9c7%__40googlegroups.com
    <https://groups.google.com/d/msgid/elasticsearch/9486faff-3eaf-4344-8931-3121bbc5d9c7%40googlegroups.com>>>>>>.
                           >     >     > For more options, visithttps://groups.google.______com/groups/opt_out
                      <http://groups.google.com/____groups/opt_out <http://groups.google.com/__groups/opt_out>
    <http://groups.google.com/__groups/opt_out <http://groups.google.com/groups/opt_out>>>
             <http://groups.google.com/______groups/opt_out <http://groups.google.com/____groups/opt_out>
    <http://groups.google.com/____groups/opt_out <http://groups.google.com/__groups/opt_out>>
                      <http://groups.google.com/____groups/opt_out <http://groups.google.com/__groups/opt_out>
    <http://groups.google.com/__groups/opt_out <http://groups.google.com/groups/opt_out>>>>
             <http://groups.google.com/______groups/opt_out <http://groups.google.com/____groups/opt_out>
    <http://groups.google.com/____groups/opt_out <http://groups.google.com/__groups/opt_out>>
                      <http://groups.google.com/____groups/opt_out <http://groups.google.com/__groups/opt_out>
    <http://groups.google.com/__groups/opt_out <http://groups.google.com/groups/opt_out>>>

                           <http://groups.google.com/______groups/opt_out
    <http://groups.google.com/____groups/opt_out> <http://groups.google.com/____groups/opt_out
    <http://groups.google.com/__groups/opt_out>>
             <http://groups.google.com/____groups/opt_out <http://groups.google.com/__groups/opt_out>
    <http://groups.google.com/__groups/opt_out <http://groups.google.com/groups/opt_out>>>>>
                      <https://groups.google.com/______groups/opt_out <https://groups.google.com/____groups/opt_out>
    <https://groups.google.com/____groups/opt_out <https://groups.google.com/__groups/opt_out>>
             <https://groups.google.com/____groups/opt_out <https://groups.google.com/__groups/opt_out>
    <https://groups.google.com/__groups/opt_out <https://groups.google.com/groups/opt_out>>>
                           <https://groups.google.com/______groups/opt_out
    <https://groups.google.com/____groups/opt_out> <https://groups.google.com/____groups/opt_out
    <https://groups.google.com/__groups/opt_out>>
             <https://groups.google.com/____groups/opt_out <https://groups.google.com/__groups/opt_out>
    <https://groups.google.com/__groups/opt_out <https://groups.google.com/groups/opt_out>>>>
                           >     <https://groups.google.com/______groups/opt_out
    <https://groups.google.com/____groups/opt_out> <https://groups.google.com/____groups/opt_out
    <https://groups.google.com/__groups/opt_out>>
             <https://groups.google.com/____groups/opt_out <https://groups.google.com/__groups/opt_out>
    <https://groups.google.com/__groups/opt_out <https://groups.google.com/groups/opt_out>>>
                      <https://groups.google.com/______groups/opt_out <https://groups.google.com/____groups/opt_out>
    <https://groups.google.com/____groups/opt_out <https://groups.google.com/__groups/opt_out>>
             <https://groups.google.com/____groups/opt_out <https://groups.google.com/__groups/opt_out>
    <https://groups.google.com/__groups/opt_out <https://groups.google.com/groups/opt_out>>>>>>.

                           >     >
                           >     >     --
                           >     >     Costin
                           >     >
                           >     > --
                           >     > You received this message because you are subscribed to the Google Groups
    "elasticsearch"
             group.
                           >     > To unsubscribe from this group and stop receiving emails from it, send an email to
                           >     >elasticsearc...@googlegroups.______com <mailto:elasticsearc...@
    <mailto:elasticsearc...@>__goog__legroups.com <http://googlegroups.com>
             <mailto:elasticsearc...@__googlegroups.com <mailto:elasticsearc...@googlegroups.com>>> <javascript:>.

                           >     > To view this discussion on the web visit
                           >


     >https://groups.google.com/d/______msgid/elasticsearch/__86187c3a-____0974-4d10-9689-__e83da788c04a%______40googlegroups.com <https://groups.google.com/d/____msgid/elasticsearch/86187c3a-____0974-4d10-9689-e83da788c04a%____40googlegroups.com> <https://groups.google.com/d/____msgid/elasticsearch/86187c3a-____0974-4d10-9689-e83da788c04a%____40googlegroups.com <https://groups.google.com/d/__msgid/elasticsearch/86187c3a-__0974-4d10-9689-e83da788c04a%__40googlegroups.com>>


    <https://groups.google.com/d/____msgid/elasticsearch/86187c3a-____0974-4d10-9689-e83da788c04a%____40googlegroups.com
    <https://groups.google.com/d/__msgid/elasticsearch/86187c3a-__0974-4d10-9689-e83da788c04a%__40googlegroups.com>

    <https://groups.google.com/d/__msgid/elasticsearch/86187c3a-__0974-4d10-9689-e83da788c04a%__40googlegroups.com
    <https://groups.google.com/d/msgid/elasticsearch/86187c3a-0974-4d10-9689-e83da788c04a%40googlegroups.com>>>



    <https://groups.google.com/d/______msgid/elasticsearch/__86187c3a-____0974-4d10-9689-__e83da788c04a%______40googlegroups.com
    <https://groups.google.com/d/____msgid/elasticsearch/86187c3a-____0974-4d10-9689-e83da788c04a%____40googlegroups.com>

    <https://groups.google.com/d/____msgid/elasticsearch/86187c3a-____0974-4d10-9689-e83da788c04a%____40googlegroups.com
    <https://groups.google.com/d/__msgid/elasticsearch/86187c3a-__0974-4d10-9689-e83da788c04a%__40googlegroups.com>>


    <https://groups.google.com/d/____msgid/elasticsearch/86187c3a-____0974-4d10-9689-e83da788c04a%____40googlegroups.com
    <https://groups.google.com/d/__msgid/elasticsearch/86187c3a-__0974-4d10-9689-e83da788c04a%__40googlegroups.com>

    <https://groups.google.com/d/__msgid/elasticsearch/86187c3a-__0974-4d10-9689-e83da788c04a%__40googlegroups.com
    <https://groups.google.com/d/msgid/elasticsearch/86187c3a-0974-4d10-9689-e83da788c04a%40googlegroups.com>>>>
                           >


    <https://groups.google.com/d/______msgid/elasticsearch/__86187c3a-____0974-4d10-9689-__e83da788c04a%______40googlegroups.com
    <https://groups.google.com/d/____msgid/elasticsearch/86187c3a-____0974-4d10-9689-e83da788c04a%____40googlegroups.com>

    <https://groups.google.com/d/____msgid/elasticsearch/86187c3a-____0974-4d10-9689-e83da788c04a%____40googlegroups.com
    <https://groups.google.com/d/__msgid/elasticsearch/86187c3a-__0974-4d10-9689-e83da788c04a%__40googlegroups.com>>


    <https://groups.google.com/d/____msgid/elasticsearch/86187c3a-____0974-4d10-9689-e83da788c04a%____40googlegroups.com
    <https://groups.google.com/d/__msgid/elasticsearch/86187c3a-__0974-4d10-9689-e83da788c04a%__40googlegroups.com>

    <https://groups.google.com/d/__msgid/elasticsearch/86187c3a-__0974-4d10-9689-e83da788c04a%__40googlegroups.com
    <https://groups.google.com/d/msgid/elasticsearch/86187c3a-0974-4d10-9689-e83da788c04a%40googlegroups.com>>>



    <https://groups.google.com/d/______msgid/elasticsearch/__86187c3a-____0974-4d10-9689-__e83da788c04a%______40googlegroups.com
    <https://groups.google.com/d/____msgid/elasticsearch/86187c3a-____0974-4d10-9689-e83da788c04a%____40googlegroups.com>

    <https://groups.google.com/d/____msgid/elasticsearch/86187c3a-____0974-4d10-9689-e83da788c04a%____40googlegroups.com
    <https://groups.google.com/d/__msgid/elasticsearch/86187c3a-__0974-4d10-9689-e83da788c04a%__40googlegroups.com>>


    <https://groups.google.com/d/____msgid/elasticsearch/86187c3a-____0974-4d10-9689-e83da788c04a%____40googlegroups.com
    <https://groups.google.com/d/__msgid/elasticsearch/86187c3a-__0974-4d10-9689-e83da788c04a%__40googlegroups.com>

    <https://groups.google.com/d/__msgid/elasticsearch/86187c3a-__0974-4d10-9689-e83da788c04a%__40googlegroups.com
    <https://groups.google.com/d/msgid/elasticsearch/86187c3a-0974-4d10-9689-e83da788c04a%40googlegroups.com>>>>>.
                           >     > For more options, visithttps://groups.google.______com/groups/opt_out
                      <http://groups.google.com/____groups/opt_out <http://groups.google.com/__groups/opt_out>
    <http://groups.google.com/__groups/opt_out <http://groups.google.com/groups/opt_out>>>
             <http://groups.google.com/______groups/opt_out <http://groups.google.com/____groups/opt_out>
    <http://groups.google.com/____groups/opt_out <http://groups.google.com/__groups/opt_out>>
                      <http://groups.google.com/____groups/opt_out <http://groups.google.com/__groups/opt_out>
    <http://groups.google.com/__groups/opt_out <http://groups.google.com/groups/opt_out>>>>
             <https://groups.google.com/______groups/opt_out <https://groups.google.com/____groups/opt_out>
    <https://groups.google.com/____groups/opt_out <https://groups.google.com/__groups/opt_out>>
                      <https://groups.google.com/____groups/opt_out <https://groups.google.com/__groups/opt_out>
    <https://groups.google.com/__groups/opt_out <https://groups.google.com/groups/opt_out>>>
                           <https://groups.google.com/______groups/opt_out
    <https://groups.google.com/____groups/opt_out> <https://groups.google.com/____groups/opt_out
    <https://groups.google.com/__groups/opt_out>>
             <https://groups.google.com/____groups/opt_out <https://groups.google.com/__groups/opt_out>
    <https://groups.google.com/__groups/opt_out <https://groups.google.com/groups/opt_out>>>>>.

                           >
                           >     --
                           >     Costin
                           >
                           > --
                           > You received this message because you are subscribed to the Google Groups
    "elasticsearch" group.
                           > To unsubscribe from this group and stop receiving emails from it, send an email to
                           >elasticsearc...@googlegroups.______com <mailto:elasticsearc...@
    <mailto:elasticsearc...@>__goog__legroups.com <http://googlegroups.com>
             <mailto:elasticsearc...@__googlegroups.com <mailto:elasticsearc...@googlegroups.com>>> <javascript:>.

                           > To view this discussion on the web visit



     >https://groups.google.com/d/______msgid/elasticsearch/__e29e342d-____de74-4ed6-93d4-__875fc728c5a5%______40googlegroups.com <https://groups.google.com/d/____msgid/elasticsearch/e29e342d-____de74-4ed6-93d4-875fc728c5a5%____40googlegroups.com> <https://groups.google.com/d/____msgid/elasticsearch/e29e342d-____de74-4ed6-93d4-875fc728c5a5%____40googlegroups.com <https://groups.google.com/d/__msgid/elasticsearch/e29e342d-__de74-4ed6-93d4-875fc728c5a5%__40googlegroups.com>>


    <https://groups.google.com/d/____msgid/elasticsearch/e29e342d-____de74-4ed6-93d4-875fc728c5a5%____40googlegroups.com
    <https://groups.google.com/d/__msgid/elasticsearch/e29e342d-__de74-4ed6-93d4-875fc728c5a5%__40googlegroups.com>

    <https://groups.google.com/d/__msgid/elasticsearch/e29e342d-__de74-4ed6-93d4-875fc728c5a5%__40googlegroups.com
    <https://groups.google.com/d/msgid/elasticsearch/e29e342d-de74-4ed6-93d4-875fc728c5a5%40googlegroups.com>>>



    <https://groups.google.com/d/______msgid/elasticsearch/__e29e342d-____de74-4ed6-93d4-__875fc728c5a5%______40googlegroups.com
    <https://groups.google.com/d/____msgid/elasticsearch/e29e342d-____de74-4ed6-93d4-875fc728c5a5%____40googlegroups.com>

    <https://groups.google.com/d/____msgid/elasticsearch/e29e342d-____de74-4ed6-93d4-875fc728c5a5%____40googlegroups.com
    <https://groups.google.com/d/__msgid/elasticsearch/e29e342d-__de74-4ed6-93d4-875fc728c5a5%__40googlegroups.com>>


    <https://groups.google.com/d/____msgid/elasticsearch/e29e342d-____de74-4ed6-93d4-875fc728c5a5%____40googlegroups.com
    <https://groups.google.com/d/__msgid/elasticsearch/e29e342d-__de74-4ed6-93d4-875fc728c5a5%__40googlegroups.com>

    <https://groups.google.com/d/__msgid/elasticsearch/e29e342d-__de74-4ed6-93d4-875fc728c5a5%__40googlegroups.com
    <https://groups.google.com/d/msgid/elasticsearch/e29e342d-de74-4ed6-93d4-875fc728c5a5%40googlegroups.com>>>>.

                           > For more options, visithttps://groups.google.______com/groups/opt_out
                      <http://groups.google.com/____groups/opt_out <http://groups.google.com/__groups/opt_out>
    <http://groups.google.com/__groups/opt_out <http://groups.google.com/groups/opt_out>>>
             <https://groups.google.com/______groups/opt_out <https://groups.google.com/____groups/opt_out>
    <https://groups.google.com/____groups/opt_out <https://groups.google.com/__groups/opt_out>>

                      <https://groups.google.com/____groups/opt_out <https://groups.google.com/__groups/opt_out>
    <https://groups.google.com/__groups/opt_out <https://groups.google.com/groups/opt_out>>>>.

                           --
                           Costin

                      --
                      You received this message because you are subscribed to the Google Groups "elasticsearch" group.
                      To unsubscribe from this group and stop receiving emails from it, send an email to
                      elasticsearch+unsubscribe@__go____oglegroups.com <http://go__oglegroups.com>
    <http://googlegroups.com>
             <mailto:elasticsearch%____2Bunsubscribe@googlegroups.com
    <mailto:elasticsearch%25__2Bunsubscribe@googlegroups.com>
    <mailto:elasticsearch%__252Bunsubscribe@googlegroups.__com
    <mailto:elasticsearch%25252Bunsubscribe@googlegroups.com>>__>
                      <mailto:elasticsearch+______unsubscribe@googlegroups.com
    <mailto:elasticsearch%2B____unsubscribe@googlegroups.com>
             <mailto:elasticsearch%2B____unsubscribe@googlegroups.com
    <mailto:elasticsearch%252B__unsubscribe@googlegroups.com>>
    <mailto:elasticsearch%____2Bunsubscribe@googlegroups.com <mailto:elasticsearch%25__2Bunsubscribe@googlegroups.com>
             <mailto:elasticsearch%__252Bunsubscribe@googlegroups.__com
    <mailto:elasticsearch%25252Bunsubscribe@googlegroups.com>>__>>.


                      To view this discussion on the web visit
    https://groups.google.com/d/______msgid/elasticsearch/__c1081bf2-____117a-4af2-ba90-__2c38a4572782%______40googlegroups.com
    <https://groups.google.com/d/____msgid/elasticsearch/c1081bf2-____117a-4af2-ba90-2c38a4572782%____40googlegroups.com>

    <https://groups.google.com/d/____msgid/elasticsearch/c1081bf2-____117a-4af2-ba90-2c38a4572782%____40googlegroups.com
    <https://groups.google.com/d/__msgid/elasticsearch/c1081bf2-__117a-4af2-ba90-2c38a4572782%__40googlegroups.com>>


    <https://groups.google.com/d/____msgid/elasticsearch/c1081bf2-____117a-4af2-ba90-2c38a4572782%____40googlegroups.com
    <https://groups.google.com/d/__msgid/elasticsearch/c1081bf2-__117a-4af2-ba90-2c38a4572782%__40googlegroups.com>

    <https://groups.google.com/d/__msgid/elasticsearch/c1081bf2-__117a-4af2-ba90-2c38a4572782%__40googlegroups.com
    <https://groups.google.com/d/msgid/elasticsearch/c1081bf2-117a-4af2-ba90-2c38a4572782%40googlegroups.com>>>


    <https://groups.google.com/d/______msgid/elasticsearch/__c1081bf2-____117a-4af2-ba90-__2c38a4572782%______40googlegroups.com?utm_____medium=__email&utm_source=____footer
    <https://groups.google.com/d/____msgid/elasticsearch/c1081bf2-____117a-4af2-ba90-2c38a4572782%____40googlegroups.com?utm___medium=__email&utm_source=__footer>

    <https://groups.google.com/d/____msgid/elasticsearch/c1081bf2-____117a-4af2-ba90-2c38a4572782%____40googlegroups.com?utm___medium=__email&utm_source=__footer
    <https://groups.google.com/d/__msgid/elasticsearch/c1081bf2-__117a-4af2-ba90-2c38a4572782%__40googlegroups.com?utm_medium=__email&utm_source=footer>>


    <https://groups.google.com/d/____msgid/elasticsearch/c1081bf2-____117a-4af2-ba90-2c38a4572782%____40googlegroups.com?utm___medium=__email&utm_source=__footer
    <https://groups.google.com/d/__msgid/elasticsearch/c1081bf2-__117a-4af2-ba90-2c38a4572782%__40googlegroups.com?utm_medium=__email&utm_source=footer>

    <https://groups.google.com/d/__msgid/elasticsearch/c1081bf2-__117a-4af2-ba90-2c38a4572782%__40googlegroups.com?utm_medium=__email&utm_source=footer
    <https://groups.google.com/d/msgid/elasticsearch/c1081bf2-117a-4af2-ba90-2c38a4572782%40googlegroups.com?utm_medium=email&utm_source=footer>>>>.

                      For more options, visit https://groups.google.com/d/______optout
    <https://groups.google.com/d/____optout> <https://groups.google.com/d/____optout
    <https://groups.google.com/d/__optout>>
             <https://groups.google.com/d/____optout <https://groups.google.com/d/__optout>
    <https://groups.google.com/d/__optout <https://groups.google.com/d/optout>>>.


                  --
                  Costin

                  --

                  You received this message because you are subscribed to a topic in the Google Groups
    "elasticsearch" group.
                  To unsubscribe from this topic, visit
    https://groups.google.com/d/______topic/elasticsearch/S-______BrzwUHJbM/unsubscribe
    <https://groups.google.com/d/____topic/elasticsearch/S-____BrzwUHJbM/unsubscribe>
             <https://groups.google.com/d/____topic/elasticsearch/S-____BrzwUHJbM/unsubscribe
    <https://groups.google.com/d/__topic/elasticsearch/S-__BrzwUHJbM/unsubscribe>>
                  <https://groups.google.com/d/____topic/elasticsearch/S-____BrzwUHJbM/unsubscribe
    <https://groups.google.com/d/__topic/elasticsearch/S-__BrzwUHJbM/unsubscribe>
             <https://groups.google.com/d/__topic/elasticsearch/S-__BrzwUHJbM/unsubscribe
    <https://groups.google.com/d/topic/elasticsearch/S-BrzwUHJbM/unsubscribe>>>.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
elasticsearch+unsubscribe@googlegroups.com mailto:elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CALD%2B6GOd-wSu_XQpc_cjCcv_ZgdiwEoJ6BT6VCAkL4ThvOHTAw%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CALD%2B6GOd-wSu_XQpc_cjCcv_ZgdiwEoJ6BT6VCAkL4ThvOHTAw%40mail.gmail.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout.

--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/537274ED.3010203%40gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Nick Pentreath) #15

Hi Costin

Here is some info on setup, versions, and the simplified version of code /
data: https://gist.github.com/MLnick/bb7c6e87f5c53be2cce4

I am using both Spark (the ESInputFormat directly) and Shark (ES hive jar
bundle). Spark works fine but does return Text (as opposed to Long or
whatever), while Shark returns NULLS (with the debug info as per below).

Hopefully this helps, ping me if you need more info ?

N

On Tue, May 13, 2014 at 9:39 PM, Costin Leau costin.leau@gmail.com wrote:

Could you share your setup and configuration on a gist (the more info
especially regarding the versions of the stack used helps)?
Do you use just the input format or also the output format? To clarify -
are you using Spark (Map/Reduce) or Shark (and the relevant
Hive integration in es-hadoop)?

Cheers,

On 5/13/14 10:14 PM, Nick Pentreath wrote:

Ok - well let me know when you're around.

The mapreduce inputformat works fine. I'm using it with Spark to access
the ES data via ESInputFormat and run analytics
and machine learning jobs on that data, and the same _ts field works and
is the correct data (though it comes through as
org.apache.hadoop.io.Text, which I convert to Long or a DateTime as
required).

Perhaps I'm missing it somewhere but is it possible to force a field to
be a type? i.e. similar the es.field.mapping
could I tell it that it must parse the field as a string (since then I
can take it and do whatever parsing / casting I
want).

I could just use the new Spark SQL module (which I'm seriously
considering right now having explored it a bit in the
last few days), but some of the stuff we do requires a SQL Console and
JDBC, so having Shark able to just pull in ES
data is definitely very useful...

On Tue, May 13, 2014 at 8:18 PM, Costin Leau <costin.leau@gmail.com<mailto:
costin.leau@gmail.com>> wrote:

Hi Nick,

I'm glad to see you are making progress. This week I'm mainly on the

road but maybe we can meet on the IRC next
week, my invitation still stands :slight_smile:
Timestamp is relatively new type and doesn't handle timezones
properly - it is backed by java.sq.Timestamp so it
inherits a lot of its issues.
For some reason the year in your date is rather off so it's worth
checking the data read by es-hadoop before passing
it to Hive (see [1]).
I've had issues myself with it and it the moment the cluster is in a
different timezone than the dataset itself
things get buggy.
Try using a UDF to do the conversion from the long to a timestamp -
I've tried doing something similar in our
conversion but since we don't know the timezones
used, it's easy for things to get mixed.

Cheers,

[1] http://www.elasticsearch.org/__guide/en/elasticsearch/

hadoop/__current/troubleshooting.html

<http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/

current/troubleshooting.html>

On 5/13/14 8:25 PM, Nick Pentreath wrote:

    Hi Costin

    Sorry for the silence on this issue. This went a bit quiet.

    But the good news is I've come back to it and managed to get it

all working with the new shark 0.9.1 release and
2.0.0RC1. Actually if I used ADD JAR I got the same exception but
when I just put the JAR into the shark lib/
folder it
worked fine (which seems to point to the classpath issue you
mention).

    However, I seem to have an issue with date <-> timestamp

conversion.

    I have a field in ES called "_ts" that has type "date" and the

default format "dateOptionalTime". When I do a
query that
includes the timestamp it comes back NULL:

    select ts from table ...
    (note I use a correct es.mapping.names to map the _ts field in ES

to ts field in Hive/Shark that has timestamp
type).

    below is some of the debug-level output:

    14/05/13 19:19:47 DEBUG lazy.LazyPrimitive: Data not in the

TIMESTAMP data type range so converted to null.
Given data
is :96997506-06-30 19:08:168:16.768
14/05/13 19:19:47 DEBUG lazy.LazyPrimitive: Data not in the
TIMESTAMP data type range so converted to null.
Given data
is :96997605-06-28 19:08:168:16.768
14/05/13 19:19:47 DEBUG lazy.LazyPrimitive: Data not in the
TIMESTAMP data type range so converted to null.
Given data
is :96997624-06-28 19:08:168:16.768
14/05/13 19:19:47 DEBUG lazy.LazyPrimitive: Data not in the
TIMESTAMP data type range so converted to null.
Given data
is :96997629-06-28 19:08:168:16.768
14/05/13 19:19:47 DEBUG lazy.LazyPrimitive: Data not in the
TIMESTAMP data type range so converted to null.
Given data
is :96997634-06-29 19:08:168:16.768
NULL
NULL
NULL
NULL
NULL

    The data that I index in the _ts field is timestamp in ms (long).

It doesn't seem to be converted correctly but
the data
is correct (in ms at least) and I can query against it using date
formats and date math in ES.

    Example snippet from debug log from above:
    ,"_ts":1397130475607}}]}}"


    Any ideas or am I doing something silly?

    I do see that the Hive timestamp expects either seconds since

epoch of a string-based format that has nanosecond
granularity. Is this the issue with just ms long timestamp data?

    Thanks
    Nick


    On Thu, Mar 27, 2014 at 4:50 PM, Costin Leau <

costin.leau@gmail.com mailto:costin.leau@gmail.com
<mailto:costin.leau@gmail.com mailto:costin.leau@gmail.com>__>
wrote:

         Using the latest hive and hadoop is preferred as they

contain various bug fixes.
The error suggests a classpath issue - namely the same class
is loaded twice for some reason and hence the
casting
fails.

         Let's connect on IRC - give me a ping when you're available

(user is costin).

         Cheers,


         On 3/27/14 4:29 PM, Nick Pentreath wrote:

             Thanks for the response.

             I tried latest Shark (cdh4 version of 0.9.1 here

http://cloudera.rst.im/shark/ ) - this uses hadoop
1.0.4 and
hive 0.11
I believe, and build elasticsearch-hadoop from github
master.

             Still getting same error:
             org.elasticsearch.hadoop.hive.____EsHiveInputFormat$__EsHiveSplit

cannot be cast to
org.elasticsearch.hadoop.hive.__EsHiveInputFormat$
EsHiveSplit

             Will using hive 0.11 / hadoop 1.0.4 vs hive 0.12 /

hadoop 1.2.1 in es-hadoop master make a difference?

             Anyone else actually got this working?



             On Thu, Mar 20, 2014 at 2:44 PM, Costin Leau <

costin.leau@gmail.com mailto:costin.leau@gmail.com
<mailto:costin.leau@gmail.com mailto:costin.leau@gmail.com>
<mailto:costin.leau@gmail.com <mailto:
costin.leau@gmail.com> <mailto:costin.leau@gmail.com
mailto:costin.leau@gmail.com>>> wrote:

                  I recommend using master - there are several

improvements done in this area. Also using the latest
Shark
(0.9.0) and
Hive (0.12) will help.

                  On 3/20/14 12:00 PM, Nick Pentreath wrote:

                      Hi

                      I am struggling to get this working too. I'm

just trying locally for now, running Shark 0.8.1,
Hive
0.9.0 and ES
1.0.1
with ES-hadoop 1.3.0.M2.

                      I managed to get a basic example working with

WRITING into an index. But I'm really after
READING and
index.

                      I believe I have set everything up correctly,

I've added the jar to Shark:
ADD JAR /path/to/es-hadoop.jar;

                      created a table:
                      CREATE EXTERNAL TABLE test_read (name string,

price double)

                      STORED BY 'org.elasticsearch.hadoop.____

__hive.EsStorageHandler'

                      TBLPROPERTIES('es.resource' =

'test_index/test_type/_search?______q=*');

                      And then trying to 'SELECT * FROM test _read'

gives me :

                      org.apache.spark.______SparkException: Job

aborted: Task 3.0:0 failed more than 0 times;
aborting job
java.lang.ClassCastException:
org.elasticsearch.hadoop.hive.______EsHiveInputFormat$____ESHiveSplit
cannot
be cast to
org.elasticsearch.hadoop.hive.
______EsHiveInputFormat$____ESHiveSplit

                      at
    org.apache.spark.scheduler.______DAGScheduler$$anonfun$_____

_abortStage$1.apply(______DAGScheduler.scala:827)

                      at
    org.apache.spark.scheduler.______DAGScheduler$$anonfun$_____

_abortStage$1.apply(______DAGScheduler.scala:825)

                      at scala.collection.mutable._____

_ResizableArray$class.foreach(______ResizableArray.scala:60)

                      at scala.collection.mutable._____

_ArrayBuffer.foreach(______ArrayBuffer.scala:47)

                      at org.apache.spark.scheduler.___

___DAGScheduler.abortStage(______DAGScheduler.scala:825)

                      at org.apache.spark.scheduler.___

___DAGScheduler.processEvent(______DAGScheduler.scala:440)

                      at org.apache.spark.scheduler.___

___DAGScheduler.org
<http://org.apache.spark.__sch
__eduler.DAGScheduler.org http://scheduler.DAGScheduler.org
<http://org.apache.spark.__scheduler.DAGScheduler.org
http://org.apache.spark.scheduler.DAGScheduler.org>>$
____apache$spark$scheduler$DAGScheduler$$run(
DAGScheduler.scala:502)

                      at org.apache.spark.scheduler.___

___DAGScheduler$$anon$1.run(______DAGScheduler.scala:157)

                      FAILED: Execution Error, return code -101 from

shark.execution.SparkTask

                      In fact I get the same error thrown when trying

to READ from the table that I successfully
WROTE to...

                      On Saturday, 22 February 2014 12:31:21 UTC+2,

Costin Leau wrote:

                           Yeah, it might have been some sort of

network configuration issue where services where
running on
different
machines
and
localhost pointed to a different location.

                           Either way, I'm glad to hear things have

are moving forward.

                           Cheers,

                           On 22/02/2014 1:06 AM, Max Lang wrote:
                           > I managed to get it working on ec2

without issue this time. I'd say the biggest
difference was
that this
time I set up a
> dedicated ES machine. Is it possible
that, because I was using a cluster with slaves,
when I used
"localhost" the slaves
> couldn't find the ES instance running on
the master? Or do all the requests go through
the master?
>
>
> On Wednesday, February 19, 2014 2:35:40
PM UTC-8, Costin Leau wrote:
>
> Hi,
>
> Setting logging in Hive/Hadoop can
be tricky since the log4j needs to be picked up
by the
running JVM
otherwise you
> won't see anything.
> Take a look at this link on how to
tell Hive to use your logging settings [1].
>
> For the next release, we might
introduce dedicated exceptions for the simple fact
that some
libraries, like Hive,
> swallow the stack trace and it's
unclear what the issue is which makes the exception
(IllegalStateException) ambiguous.
>
> Let me know how it goes and whether
you will encounter any issues with Shark. Or if
you don't :slight_smile:
>
> Thanks!
>
>

    [1]https://cwiki.apache.org/______confluence/display/Hive/__

____GettingStarted#GettingStarted-______ErrorLogs
<https://cwiki.apache.org/confluence/display/Hive/
GettingStarted#GettingStarted-____ErrorLogs>

             <https://cwiki.apache.org/____

confluence/display/Hive/____GettingStarted#GettingStarted-____ErrorLogs
<https://cwiki.apache.org/confluence/display/Hive/
GettingStarted#GettingStarted-__ErrorLogs>>

      <https://cwiki.apache.org/____confluence/display/Hive/____

GettingStarted#GettingStarted-____ErrorLogs
<https://cwiki.apache.org/confluence/display/Hive/
GettingStarted#GettingStarted-__ErrorLogs>
<https://cwiki.apache.org/confluence/display/Hive/
GettingStarted#GettingStarted-__ErrorLogs
<https://cwiki.apache.org/confluence/display/Hive/
GettingStarted#GettingStarted-ErrorLogs>>>

    <https://cwiki.apache.org/______confluence/display/Hive/____

__GettingStarted#GettingStarted-______ErrorLogs
<https://cwiki.apache.org/confluence/display/Hive/
GettingStarted#GettingStarted-____ErrorLogs>

             <https://cwiki.apache.org/____

confluence/display/Hive/____GettingStarted#GettingStarted-____ErrorLogs
<https://cwiki.apache.org/confluence/display/Hive/
GettingStarted#GettingStarted-__ErrorLogs>>

      <https://cwiki.apache.org/____confluence/display/Hive/____

GettingStarted#GettingStarted-____ErrorLogs
<https://cwiki.apache.org/confluence/display/Hive/
GettingStarted#GettingStarted-__ErrorLogs>
<https://cwiki.apache.org/confluence/display/Hive/
GettingStarted#GettingStarted-__ErrorLogs
<https://cwiki.apache.org/confluence/display/Hive/
GettingStarted#GettingStarted-ErrorLogs>>>>
>

    <https://cwiki.apache.org/______confluence/display/Hive/____

__GettingStarted#GettingStarted-______ErrorLogs
<https://cwiki.apache.org/confluence/display/Hive/
GettingStarted#GettingStarted-____ErrorLogs>

             <https://cwiki.apache.org/____

confluence/display/Hive/____GettingStarted#GettingStarted-____ErrorLogs
<https://cwiki.apache.org/confluence/display/Hive/
GettingStarted#GettingStarted-__ErrorLogs>>

      <https://cwiki.apache.org/____confluence/display/Hive/____

GettingStarted#GettingStarted-____ErrorLogs
<https://cwiki.apache.org/confluence/display/Hive/
GettingStarted#GettingStarted-__ErrorLogs>
<https://cwiki.apache.org/confluence/display/Hive/
GettingStarted#GettingStarted-__ErrorLogs
<https://cwiki.apache.org/confluence/display/Hive/
GettingStarted#GettingStarted-ErrorLogs>>>

    <https://cwiki.apache.org/______confluence/display/Hive/____

__GettingStarted#GettingStarted-______ErrorLogs
<https://cwiki.apache.org/confluence/display/Hive/
GettingStarted#GettingStarted-ErrorLogs>
<https://cwiki.apache.org/

confluence/display/Hive/____GettingStarted#GettingStarted-____ErrorLogs
<https://cwiki.apache.org/confluence/display/Hive/
GettingStarted#GettingStarted-__ErrorLogs>>

      <https://cwiki.apache.org/____confluence/display/Hive/____

GettingStarted#GettingStarted-____ErrorLogs
<https://cwiki.apache.org/confluence/display/Hive/
GettingStarted#GettingStarted-ErrorLogs>
<https://cwiki.apache.org/confluence/display/Hive/
GettingStarted#GettingStarted-ErrorLogs
<https://cwiki.apache.org/confluence/display/Hive/
GettingStarted#GettingStarted-ErrorLogs>>>>>
>
> On 20/02/2014 12:02 AM, Max Lang
wrote:
> > Hey Costin,
> >
> > Thanks for the swift reply. I
abandoned EC2 to take that out of the equation and
managed
to get
everything working
> > locally using the latest version
of everything (though I realized just now I'm
still on
hive 0.9).
I'm guessing you're
> > right about some port connection
issue because I definitely had ES running on
that machine.
> >
> > I changed hive-log4j.properties
and added
> > |
> > #custom logging levels
> > #log4j.logger.xxx=DEBUG
> > log4j.logger.org <
http://log4j.logger.org> http://log4j.logger.org
http://log4j.logger.org.

__elasticsearch.hadoop.rest=______TRACE
> >log4j.logger.org.__elasticsea
____rch.hadoop.mr http://elasticsea__rch.hadoop.mr
<http://elasticsearch.hadoop.__mr <http://elasticsearch.hadoop.mr

             <http://log4j.logger.org.__ela__sticsearch.hadoop.mr <

http://elasticsearch.hadoop.mr>
<http://log4j.logger.org.__elasticsearch.hadoop.mr <
http://log4j.logger.org.elasticsearch.hadoop.mr>>>
<http://log4j.logger.org.__ela
____sticsearch.hadoop.mr http://ela__sticsearch.hadoop.mr
<http://elasticsearch.hadoop.__mr <http://elasticsearch.hadoop.mr

             <http://log4j.logger.org.__ela__sticsearch.hadoop.mr <

http://elasticsearch.hadoop.mr>
<http://log4j.logger.org.__elasticsearch.hadoop.mr <
http://log4j.logger.org.elasticsearch.hadoop.mr>>>>
<http://log4j.logger.org.__ela
____sticsearch.hadoop.mr http://ela__sticsearch.hadoop.mr
<http://elasticsearch.hadoop.__mr <http://elasticsearch.hadoop.mr

             <http://log4j.logger.org.__ela__sticsearch.hadoop.mr <

http://elasticsearch.hadoop.mr>
<http://log4j.logger.org.__elasticsearch.hadoop.mr <
http://log4j.logger.org.elasticsearch.hadoop.mr>>>
<http://log4j.logger.org.__ela
____sticsearch.hadoop.mr http://ela__sticsearch.hadoop.mr
<http://elasticsearch.hadoop.__mr <http://elasticsearch.hadoop.mr

             <http://log4j.logger.org.__ela__sticsearch.hadoop.mr <

http://elasticsearch.hadoop.mr>
<http://log4j.logger.org.__elasticsearch.hadoop.mr <
http://log4j.logger.org.elasticsearch.hadoop.mr>>>>>=______TRACE

                           >     > |
                           >     >
                           >     > But I didn't see any trace

logging. Hopefully I can get it working on EC2 without
issue,
but, for
the future, is this
> > the correct way to set TRACE
logging?
> >
> > Oh and, for reference, I tried
running without ES up and I got the following,
exceptions:
> >
> > 2014-02-19 13:46:08,803 ERROR
shark.SharkDriver (Logging.scala:logError(64)) -
FAILED: Hive
Internal Error:
> > java.lang.______IllegalStateException(Cannot
discover Elasticsearch version)
> > java.lang.______IllegalStateException:
Cannot discover Elasticsearch version
> > at
org.elasticsearch.hadoop.hive.____EsStorageHandler.init(
____EsStorageHandler.java:101)
> > at

    org.elasticsearch.hadoop.hive.______EsStorageHandler.______

configureOutputJobProperties(______EsStorageHandler.java:83)
> > at

    org.apache.hadoop.hive.ql.______plan.PlanUtils.______

configureJobPropertiesForStora______geHandler(PlanUtils.java:____706)
> > at

    org.apache.hadoop.hive.ql.______plan.PlanUtils.______

configureOutputJobPropertiesFo______rStorageHandler(
PlanUtils.__java:675)
> > at
org.apache.hadoop.hive.ql.

__exec.FileSinkOperator._____augmentPlan(FileSinkOperator.
_____java:764)
> > at

    org.apache.hadoop.hive.ql.______parse.SemanticAnalyzer._____

_putOpInsertMap(______SemanticAnalyzer.java:1518)
> > at

    org.apache.hadoop.hive.ql.______parse.SemanticAnalyzer._____

_genFileSinkPlan(______SemanticAnalyzer.java:4337)
> > at

    org.apache.hadoop.hive.ql.______parse.SemanticAnalyzer._____

_genPostGroupByBodyPlan(__SemanticAnalyzer.java:6207)
> > at
org.apache.hadoop.hive.ql.

__parse.SemanticAnalyzer.______genBodyPlan(SemanticAnalyzer.
__java:6138)
> > at
org.apache.hadoop.hive.ql.

__parse.SemanticAnalyzer.______genPlan(SemanticAnalyzer.java:______6764)
> > at
shark.parse.SharkSemanticAnalyzer.
analyzeInternal(______SharkSemanticAnalyzer.scala:______149)
> > at

    org.apache.hadoop.hive.ql.______parse.BaseSemanticAnalyzer._

_____analyze(BaseSemanticAnalyzer.java:244)
> > at shark.SharkDriver.compile(

SharkDriver.scala:215)
> > at org.apache.hadoop.hive.ql.

Driver.compile(Driver.java:336)
> > at org.apache.hadoop.hive.ql.

Driver.run(Driver.java:895)
> > at shark.SharkCliDriver.

processCmd(SharkCliDriver.___scala:324)
> > at org.apache.hadoop.hive.cli.

___CliDriver.processLine(__CliDriver.java:406)
> > at shark.SharkCliDriver$.main(

SharkCliDriver.scala:232)
> > at shark.SharkCliDriver.main(

__SharkCliDriver.scala)

                           >     > Caused by: java.io.IOException:

Out of nodes and retries; caught exception
> > at
org.elasticsearch.hadoop.rest.____NetworkClient.execute(
____NetworkClient.java:81)
> > at org.elasticsearch.hadoop.rest.
______RestClient.execute(____RestClient.__java:221)
> > at org.elasticsearch.hadoop.rest.
______RestClient.execute(____RestClient.__java:205)
> > at org.elasticsearch.hadoop.rest.
______RestClient.execute(____RestClient.__java:209)
> > at org.elasticsearch.hadoop.rest.
______RestClient.get(RestClient.______java:103)
> > at
org.elasticsearch.hadoop.rest.___RestClient.esVersion(
___RestClient.java:274)
> > at

    org.elasticsearch.hadoop.rest.______InitializationUtils.____

__discoverEsVersion(______InitializationUtils.java:84)
> > at
org.elasticsearch.hadoop.hive.____EsStorageHandler.init(
____EsStorageHandler.java:99)

                           >     > ... 18 more
                           >     > Caused by:

java.net.ConnectException: Connection refused
> > at java.net.PlainSocketImpl.______socketConnect(Native
Method)

                           >     > at java.net <http://java.net> <

http://java.net>

      <http://java.net>.______AbstractPlainSocketImpl.______

doConnect(______AbstractPlainSocketImpl.java:______339)

                           >     > at java.net <http://java.net> <

http://java.net>

    <http://java.net>.______AbstractPlainSocketImpl.______

connectToAddress(______AbstractPlainSocketImpl.java:______200)

                           >     > at java.net <http://java.net> <

http://java.net>
http://java.net.AbstractPlainSocketImpl.
connect(__AbstractPlainSocketImpl.java:182)
> > at java.net.SocksSocketImpl.

connect(SocksSocketImpl.java:391)
> > at java.net.Socket.connect(

Socket.java:579)
> > at java.net.Socket.connect(

Socket.java:528)
> > at java.net.Socket.(Socket.
______java:425)
> > at java.net.Socket.(Socket.
______java:280)
> > at

    org.apache.commons.httpclient.______protocol.______

DefaultProtocolSocketFactory.createSocket(
DefaultProtocolSocketFactory.______java:80)
> > at

    org.apache.commons.httpclient.______protocol.______

DefaultProtocolSocketFactory.createSocket(
DefaultProtocolSocketFactory.______java:122)
> > at
org.apache.commons.httpclient.__HttpConnection.open(
__HttpConnection.java:707)
> > at

    org.apache.commons.httpclient.______HttpMethodDirector._____

_executeWithRetry(______HttpMethodDirector.java:387)
> > at

    org.apache.commons.httpclient.______HttpMethodDirector._____

_executeMethod(______HttpMethodDirector.java:171)
> > at
org.apache.commons.httpclient.______HttpClient.
executeMethod(______HttpClient.java:397)
> > at
org.apache.commons.httpclient.______HttpClient.
executeMethod(______HttpClient.java:323)
> > at

    org.elasticsearch.hadoop.rest.______commonshttp.______

CommonsHttpTransport.execute(______CommonsHttpTransport.java:____160)
> > at
org.elasticsearch.hadoop.rest.____NetworkClient.execute(
____NetworkClient.java:74)

                           >     > ... 25 more
                           >     >
                           >     > Let me know if there's anything in

particular you'd like me to try on EC2.
> >
> > (For posterity, the versions I
used were: hadoop 2.2.0, hive 0.9.0, shark 8.1,
spark 8.1,
es-hadoop
1.3.0.M2, java
> > 1.7.0_15, scala 2.9.3,
elasticsearch 1.0.0)
> >
> > Thanks again,
> > Max
> >
> > On Tuesday, February 18, 2014
10:16:38 PM UTC-8, Costin Leau wrote:
> >
> > The error indicates a network
error - namely es-hadoop cannot connect to
Elasticsearch
on the
default (localhost:9200)
> > HTTP port. Can you double
check whether that's indeed the case (using curl or
even
telnet on
that port) - maybe the
> > firewall prevents any
connections to be made...
> > Also you could try using the
latest Hive, 0.12 and a more recent Hadoop such
as 1.1.2
or 1.2.1.
> >
> > Additionally, can you enable
TRACE logging in your job on es-hadoop packages
org.elasticsearch.hadoop.rest and
> >org.elasticsearch.hadoop.mr <
http://org.elasticsearch.hadoop.mr>
<http://org.elasticsearch.__hadoop.mr <http://org.elasticsearch.
hadoop.mr>>
<http://org.elasticsearch.__ha__doop.mr <
http://hadoop.mr> <http://org.elasticsearch.__hadoop.mr
http://org.elasticsearch.hadoop.mr>>
<http://org.elasticsearch.__ha____doop.mr <
http://ha__doop.mr> http://hadoop.mr

    <http://org.elasticsearch.__ha__doop.mr <http://hadoop.mr>
             <http://org.elasticsearch.__hadoop.mr <

http://org.elasticsearch.hadoop.mr>>>>
<http://org.elasticsearch.__ha____doop.mr <
http://ha__doop.mr> http://hadoop.mr

    <http://org.elasticsearch.__ha__doop.mr <http://hadoop.mr>
             <http://org.elasticsearch.__hadoop.mr <

http://org.elasticsearch.hadoop.mr>>>
<http://org.elasticsearch.__ha____doop.mr<
http://ha__doop.mr> http://hadoop.mr

    <http://org.elasticsearch.__ha__doop.mr <http://hadoop.mr>
             <http://org.elasticsearch.__hadoop.mr <

http://org.elasticsearch.hadoop.mr>>>>>
<http://org.elasticsearch.__ha____doop.mr <
http://ha__doop.mr> http://hadoop.mr

    <http://org.elasticsearch.__ha__doop.mr <http://hadoop.mr>
             <http://org.elasticsearch.__hadoop.mr <

http://org.elasticsearch.hadoop.mr>>>
<http://org.elasticsearch.__ha____doop.mr http://ha__doop.mr <
http://hadoop.mr>

                      <http://org.elasticsearch.__ha__doop.mr <

http://hadoop.mr>
<http://org.elasticsearch.__hadoop.mr <http://org.elasticsearch.
hadoop.mr>>>>

                           >     <http://org.elasticsearch.__ha

____doop.mr http://ha__doop.mr http://hadoop.mr

             <http://org.elasticsearch.__ha__doop.mr <

http://hadoop.mr> <http://org.elasticsearch.__hadoop.mr
http://org.elasticsearch.hadoop.mr>>
<http://org.elasticsearch.__ha____doop.mr <
http://ha__doop.mr> http://hadoop.mr

    <http://org.elasticsearch.__ha__doop.mr <http://hadoop.mr>
             <http://org.elasticsearch.__hadoop.mr <

http://org.elasticsearch.hadoop.mr>>>>>> packages and report back ?

                           >     >
                           >     >     Thanks,
                           >     >
                           >     >     On 19/02/2014 4:03 AM, Max

Lang wrote:
> > > I set everything up using
this
guide:https://github.com/_____
_amplab/shark/wiki/Running-______Shark-on-EC2
<https://github.com/amplab/shark/wiki/Running-
Shark-on-EC2>
<https://github.com/_amplab/
_shark/wiki/Running-__Shark-on-__EC2
https://github.com/__amplab/shark/wiki/Running-__Shark-on-EC2>

                      <https://github.com/amplab/___

_shark/wiki/Running-Shark-on-_EC2
https://github.com/amplab/__shark/wiki/Running-Shark-on-__EC2
<https://github.com/amplab/

shark/wiki/Running-Shark-on-EC2
https://github.com/amplab/shark/wiki/Running-Shark-on-EC2>>
<https://github.com/amplab/

___shark/wiki/Running-Shark-on-______EC2
<https://github.com/amplab/___shark/wiki/Running-Shark-on-
___EC2>

             <https://github.com/amplab/___

_shark/wiki/Running-Shark-on-_EC2
https://github.com/amplab/__shark/wiki/Running-Shark-on-__EC2>
<https://github.com/amplab/

_shark/wiki/Running-Shark-on-_EC2
https://github.com/amplab/__shark/wiki/Running-Shark-on-__EC2
<https://github.com/amplab/

shark/wiki/Running-Shark-on-EC2
https://github.com/amplab/shark/wiki/Running-Shark-on-EC2>>>
<https://github.com/amplab/

___shark/wiki/Running-Shark-on-______EC2
<https://github.com/amplab/___shark/wiki/Running-Shark-on-
___EC2>

             <https://github.com/amplab/___

_shark/wiki/Running-Shark-on-_EC2
https://github.com/amplab/__shark/wiki/Running-Shark-on-__EC2>
<https://github.com/amplab/

_shark/wiki/Running-Shark-on-_EC2
https://github.com/amplab/__shark/wiki/Running-Shark-on-__EC2
<https://github.com/amplab/

shark/wiki/Running-Shark-on-EC2
https://github.com/amplab/shark/wiki/Running-Shark-on-EC2>>
<https://github.com/amplab/

___shark/wiki/Running-Shark-on-______EC2
<https://github.com/amplab/___shark/wiki/Running-Shark-on-
___EC2>

             <https://github.com/amplab/___

_shark/wiki/Running-Shark-on-_EC2
https://github.com/amplab/__shark/wiki/Running-Shark-on-__EC2>
<https://github.com/amplab/

_shark/wiki/Running-Shark-on-_EC2
https://github.com/amplab/__shark/wiki/Running-Shark-on-__EC2
<https://github.com/amplab/

shark/wiki/Running-Shark-on-EC2
https://github.com/amplab/shark/wiki/Running-Shark-on-EC2>>>>
> > <https://github.com/amplab/

___shark/wiki/Running-Shark-on-______EC2
<https://github.com/amplab/___shark/wiki/Running-Shark-on-
___EC2>

             <https://github.com/amplab/___

_shark/wiki/Running-Shark-on-_EC2
https://github.com/amplab/__shark/wiki/Running-Shark-on-__EC2>
<https://github.com/amplab/

_shark/wiki/Running-Shark-on-_EC2
https://github.com/amplab/__shark/wiki/Running-Shark-on-__EC2
<https://github.com/amplab/

shark/wiki/Running-Shark-on-EC2
https://github.com/amplab/shark/wiki/Running-Shark-on-EC2>>
<https://github.com/amplab/

___shark/wiki/Running-Shark-on-______EC2
<https://github.com/amplab/___shark/wiki/Running-Shark-on-
___EC2>

             <https://github.com/amplab/___

_shark/wiki/Running-Shark-on-_EC2
https://github.com/amplab/__shark/wiki/Running-Shark-on-__EC2>
<https://github.com/amplab/

_shark/wiki/Running-Shark-on-_EC2
https://github.com/amplab/__shark/wiki/Running-Shark-on-__EC2
<https://github.com/amplab/

shark/wiki/Running-Shark-on-EC2
https://github.com/amplab/shark/wiki/Running-Shark-on-EC2>>>
> <https://github.com/amplab/

___shark/wiki/Running-Shark-on-______EC2
<https://github.com/amplab/___shark/wiki/Running-Shark-on-
___EC2>

             <https://github.com/amplab/___

_shark/wiki/Running-Shark-on-_EC2
https://github.com/amplab/__shark/wiki/Running-Shark-on-__EC2>
<https://github.com/amplab/

_shark/wiki/Running-Shark-on-_EC2
https://github.com/amplab/__shark/wiki/Running-Shark-on-__EC2
<https://github.com/amplab/

shark/wiki/Running-Shark-on-EC2
https://github.com/amplab/shark/wiki/Running-Shark-on-EC2>>
<https://github.com/amplab/

___shark/wiki/Running-Shark-on-______EC2
<https://github.com/amplab/___shark/wiki/Running-Shark-on-
EC2>
<https://github.com/amplab/

_shark/wiki/Running-Shark-on-____EC2
https://github.com/amplab/__shark/wiki/Running-Shark-on-__EC2>

                      <https://github.com/amplab/___

_shark/wiki/Running-Shark-on-__EC2
https://github.com/amplab/__shark/wiki/Running-Shark-on-__EC2
<https://github.com/amplab/

shark/wiki/Running-Shark-on-__EC2
https://github.com/amplab/shark/wiki/Running-Shark-on-EC2>>>>>
on an ec2 cluster. I've
> > > copied the
elasticsearch-hadoop jars into the hive lib directory and I have
elasticsearch
running on localhost:9200. I'm
> > > running shark in a screen
session with --service screenserver and
connecting to it
at the
same time using shark -h
> > > localhost.
> > >
> > > Unfortunately, when I
attempt to write data into elasticsearch, it fails.
Here's an
example:
> > >
> > > |
> > >
[localhost:10000]shark>CREATE EXTERNAL TABLE wiki (id BIGINT,title
STRING,last_modified
STRING,xml STRING,text
> > > STRING)ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'LOCATION
's3n://spark-data/wikipedia-______sample/';

                           >     >     > Timetaken (including network

latency):0.159seconds
> > > 14/02/1901:23:33INFO
CliDriver:Timetaken (including network
latency):0.159seconds
> > >
> > >
[localhost:10000]shark>SELECT title FROM wiki LIMIT 1;
> > > Alpokalja
> > > Timetaken (including network
latency):2.23seconds
> > > 14/02/1901:23:48INFO
CliDriver:Timetaken (including network
latency):2.23seconds
> > >
> > >
[localhost:10000]shark>CREATE EXTERNAL TABLE es_wiki (id BIGINT,title
STRING,last_modified
STRING,xml STRING,text
> > > STRING)STORED BY

    'org.elasticsearch.hadoop.______hive.EsStorageHandler'______

TBLPROPERTIES('es.resource'='______wikipedia/article');

                           >     >     > Timetaken (including network

latency):0.061seconds
> > > 14/02/1901:33:51INFO
CliDriver:Timetaken (including network
latency):0.061seconds
> > >
> > >
[localhost:10000]shark>INSERT OVERWRITE TABLE es_wiki SELECTw.id
http://w.id,w.title,w.last_______modified,w.xml,w.text
FROM wiki w;
> > > [HiveError]:Queryreturned
non-zero
code:9,cause:FAILED:______ExecutionError,returncode
-101fromshark.execution.______SparkTask

                           >     >     > Timetaken (including network

latency):3.575seconds
> > > 14/02/1901:34:42INFO
CliDriver:Timetaken (including network
latency):3.575seconds
> > > |
> > >
> > > The stack trace looks like
this:

> > >
> > >
org.apache.hadoop.hive.ql.______metadata.HiveException
(org.apache.hadoop.hive.ql.______metadata.HiveException:
java.io.IOException:

                           >     >     > Out of nodes and retries;

caught exception)
> > >
> > >

    org.apache.hadoop.hive.ql.______exec.FileSinkOperator.______

processOp(FileSinkOperator.____java:602)shark.execution.
____FileSinkOperator$$anonfun$______processPartition$1.
apply(______FileSinkOperator.scala:84)shark.execution.
FileSinkOperator$$anonfun$___processPartition$1.apply(
___FileSinkOperator.scala:81)______scala.collection.
Iterator$______class.foreach(Iterator.scala:772)
scala.collection.____Iterator$
$anon$19.foreach(

Iterator.__scala:399)shark.____execution.__FileSinkOperator.
______processPartition(__FileSinkOperator.scala:81)
__shark.execution._____FileSinkOperator$.writeFiles$
_____1(FileSinkOperator.scala:__207)shark.execution.
__FileSinkOperator$$anonfun$______executeProcessFileSinkPartitio
______n$1.apply(__FileSinkOperator.scala:211)shark.execution.
FileSinkOperator$$anonfun$______executeProcessFileSinkPartitio
______n$1.apply(__FileSinkOperator.__scala:
211)org.apache.spark.______scheduler.ResultTask.

runTask(______ResultTask.scala:107)org.______apache.spark.scheduler.Ta

sk.____run(Task.scala:53)org.__apache.____spark.executor.__

Executor$____Task

         Runner$$anonfun$run$1.__apply$____mcV$sp(Executor.scala:__

215)____org.apac

                  he.spa


                           rk.dep
                           >
                           >     loy.Sp
                           >     >
                           >     >


    arkHadoopUtil.runAsUser(______SparkHadoopUtil.scala:50)org._

_____apache.spark.executor.___Executor$TaskRunner.run(
___Executor.scala:182)java.util.__concurrent.
ThreadPoolExecutor._____runWorker(ThreadPoolExecutor.
_____java:1145)java.util.concurrent.ThreadPoolExecutor$
Worker.run(____ThreadPoolExecutor.__java:615)
____java.lang.Thread.run(____Thread.__java:744

                           >
                           >     >
                           >     >     > I should be using Hive

0.9.0, shark 0.8.1, elasticsearch 1.0.0, Hadoop
1.0.4, and
java 1.7.0_51
> > > Based on my cursory look at
the hadoop and elasticsearch-hadoop sources, it
looks
like hive
is just rethrowing an
> > > IOException it's getting
from Spark, and elasticsearch-hadoop is just
hitting those
exceptions.
> > > I suppose my questions are:
Does this look like an issue with my
ES/elasticsearch-hadoop
config? And has anyone gotten
> > > elasticsearch working with
Spark/Shark?
> > > Any ideas/insights are
appreciated.
> > > Thanks,Max
> > >
> > > --
> > > You received this message
because you are subscribed to the Google Groups
"elasticsearch" group.
> > > To unsubscribe from this
group and stop receiving emails from it, send an
email to
> > >elasticsearc...@googlegroups.______com
<mailto:elasticsearc...@
mailto:elasticsearc...@__goog__legroups.com <
http://googlegroups.com>

             <mailto:elasticsearc...@__googlegroups.com <mailto:

elasticsearc...@googlegroups.com>>> <javascript:>.

                           >     >     > To view this discussion on

the web visit
> >

     >

...

[Message clipped]

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALD%2B6GOCjCvN2z0uijL6_G3qF5ki4afuMTzBLy%3D%2Bj5W6qqSuSQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #16