[Hadoop] storing data in ES using pig script


(hanine) #1

Hello ,

I m trying to store data in ES (head) using pig script and it gives me

Input(s):
Failed to read data from "/user/hive/warehouse/books"

Output(s):
Failed to produce result in "books/book"

I ll be so thankful if someone would like help me

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/979f5688-bd53-4b76-a97a-5b0359c8be75%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Costin Leau) #2

Hi,

That isn't a lot of information so it's hard to figure out what's actually work - one can only guess. Can you post your
stacktrace/logs and your script pig somewhere - like a gist?

One thing that stands out is that you mention you are using Pig yet your path points to a Hive warehouse:

/Failed to read data from "/user/hive/warehouse/books"/

I can infer from this that maybe, the issue, is the fact that you are trying to read a Hive internal file, which Pig
can't understand, leading to the error that you see.

Cheers,

On 4/14/14 1:23 AM, hanine haninne wrote:

Hello ,

I m trying to store data in ES (head) using pig script and it gives me

/Input(s):confused:
/Failed to read data from "/user/hive/warehouse/books"/

/Output(s):confused:
/Failed to produce result in "books/book"/

I ll be so thankful if someone would like help me

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
elasticsearch+unsubscribe@googlegroups.com mailto:elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/979f5688-bd53-4b76-a97a-5b0359c8be75%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/979f5688-bd53-4b76-a97a-5b0359c8be75%40googlegroups.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout.

--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/534B43CB.3020507%40gmail.com.
For more options, visit https://groups.google.com/d/optout.


(hanine) #3

Hi,

Here is my log and my script Pig

log file :
Backend error message

java.io.IOException: java.io.IOException: Out of nodes and retries; caught
exception
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:469)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:432)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:412)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:256)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
at
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.io.IOException: Out of nodes and retries; caught exception
at
org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:81)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:221)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:205)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:209)
at org.elasticsearch.hadoop.rest.RestClient.get(RestClient.java:103)
at
org.elasticsearch.hadoop.rest.RestClient.discoverNodes(RestClient.java:85)
at
org.elasticsearch.hadoop.rest.InitializationUtils.discoverNodesIfNeeded(InitializationUtils.java:60)
at
org.elasticsearch.hadoop.mr.EsOutputFormat$ESRecordWriter.init(EsOutputFormat.java:165)
at
org.elasticsearch.hadoop.mr.EsOutputFormat$ESRecordWriter.write(EsOutputFormat.java:147)
at org.elasticsearch.hadoop.pig.EsStorage.putNext(EsStorage.java:188)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
at
org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:586)
at
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:467)
... 11 more
Caused by: java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at java.net.Socket.connect(Socket.java:528)
at java.net.Socket.(Socket.java:425)
at java.net.Socket.(Socket.java:280)
at
org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:79)
at
org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:121)
at
org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:706)
at
org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:386)
at
org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:170)
at
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:396)
at
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:324)
at
org.elasticsearch.hadoop.rest.commonshttp.CommonsHttpTransport.execute(CommonsHttpTransport.java:160)
at
org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:74)
... 25 more

Pig script:
REGISTER /home/hduser/hadoop/lib/elasticsearch-hadoop-1.2.0.jar;
weblogs = LOAD '/user/hive/warehouse/weblogs' USING PigStorage('\t')
AS (client_ip : chararray,
full_request_date : chararray,
day : int,
month : chararray,
month_num : int,
year : int,
hour : int,
minute : int,
second : int,
timezone : chararray,
http_verb : chararray,
uri : chararray,
http_status_code : chararray,
bytes_returned : chararray,
referrer : chararray,
user_agent : chararray
);
weblog_group = GROUP weblogs by (client_ip, year, month_num);
weblog_count = FOREACH weblog_group GENERATE group.client_ip, group.year,
group.month_num, COUNT_STAR(weblogs) as pageviews;

STORE weblog_count INTO 'weblogs2/logs2' USING
org.elasticsearch.hadoop.pig.EsStorage();

And what ever I put in the LOAD it gives me the same result,even if I put
the path of me desktop

Thx

Le lundi 14 avril 2014 03:11:23 UTC+1, Costin Leau a écrit :

Hi,

That isn't a lot of information so it's hard to figure out what's actually
work - one can only guess. Can you post your
stacktrace/logs and your script pig somewhere - like a gist?

One thing that stands out is that you mention you are using Pig yet your
path points to a Hive warehouse:

/Failed to read data from "/user/hive/warehouse/books"/

I can infer from this that maybe, the issue, is the fact that you are
trying to read a Hive internal file, which Pig
can't understand, leading to the error that you see.

Cheers,

On 4/14/14 1:23 AM, hanine haninne wrote:

Hello ,

I m trying to store data in ES (head) using pig script and it gives me

/Input(s):confused:
/Failed to read data from "/user/hive/warehouse/books"/

/Output(s):confused:
/Failed to produce result in "books/book"/

I ll be so thankful if someone would like help me

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to
elasticsearc...@googlegroups.com <javascript:> <mailto:
elasticsearch+unsubscribe@googlegroups.com <javascript:>>.
To view this discussion on the web visit

https://groups.google.com/d/msgid/elasticsearch/979f5688-bd53-4b76-a97a-5b0359c8be75%40googlegroups.com

<
https://groups.google.com/d/msgid/elasticsearch/979f5688-bd53-4b76-a97a-5b0359c8be75%40googlegroups.com?utm_medium=email&utm_source=footer>.

For more options, visit https://groups.google.com/d/optout.

--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e766ac9d-788e-4614-80ea-04960fdc257f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Costin Leau) #4

Since you are not specifying the network configuration for an elasticsearch
node, it will default to localhost:9200. This works as long as you are
running Hadoop (Pig, Hive, Cascading, etc...) on the same machine as
Elasticsearch - based on your exception that is unlikely the case.
Try specifying the es.nodes parameter - see the documentation for more
information.

Additionally, you seem to be using the wrong jar of es-hadoop - in your
script you are registering es-hadoop-1.2.0.jar (which does not support the
pig/hive/cascading functionality) while the stacktrace indicate you are
using es-hadoop-1.3.X.jar.

Make sure you are using es-hadoop-1.3.0.M3.jar (which is released and
available in Maven Central) and no other version. I recommend starting with
the examples in the reference docs, which show to easily load and store
data to/from Elasticsearch.
Once that works, consider extending your script.

Hope this helps,

On Mon, Apr 14, 2014 at 11:23 AM, hanine haninne haninne.5@gmail.comwrote:

Hi,

Here is my log and my script Pig

log file :
Backend error message

java.io.IOException: java.io.IOException: Out of nodes and retries; caught
exception
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:469)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:432)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:412)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:256)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
at
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.io.IOException: Out of nodes and retries; caught exception
at
org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:81)
at
org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:221)
at
org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:205)
at
org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:209)
at org.elasticsearch.hadoop.rest.RestClient.get(RestClient.java:103)
at
org.elasticsearch.hadoop.rest.RestClient.discoverNodes(RestClient.java:85)
at
org.elasticsearch.hadoop.rest.InitializationUtils.discoverNodesIfNeeded(InitializationUtils.java:60)
at
org.elasticsearch.hadoop.mr.EsOutputFormat$ESRecordWriter.init(EsOutputFormat.java:165)
at
org.elasticsearch.hadoop.mr.EsOutputFormat$ESRecordWriter.write(EsOutputFormat.java:147)
at org.elasticsearch.hadoop.pig.EsStorage.putNext(EsStorage.java:188)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
at
org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:586)
at
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:467)
... 11 more
Caused by: java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at java.net.Socket.connect(Socket.java:528)
at java.net.Socket.(Socket.java:425)
at java.net.Socket.(Socket.java:280)
at
org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:79)
at
org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:121)
at
org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:706)
at
org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:386)
at
org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:170)
at
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:396)
at
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:324)
at
org.elasticsearch.hadoop.rest.commonshttp.CommonsHttpTransport.execute(CommonsHttpTransport.java:160)
at
org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:74)
... 25 more

Pig script:
REGISTER /home/hduser/hadoop/lib/elasticsearch-hadoop-1.2.0.jar;
weblogs = LOAD '/user/hive/warehouse/weblogs' USING PigStorage('\t')
AS (client_ip : chararray,
full_request_date : chararray,
day : int,
month : chararray,
month_num : int,
year : int,
hour : int,
minute : int,
second : int,
timezone : chararray,
http_verb : chararray,
uri : chararray,
http_status_code : chararray,
bytes_returned : chararray,
referrer : chararray,
user_agent : chararray
);
weblog_group = GROUP weblogs by (client_ip, year, month_num);
weblog_count = FOREACH weblog_group GENERATE group.client_ip, group.year,
group.month_num, COUNT_STAR(weblogs) as pageviews;

STORE weblog_count INTO 'weblogs2/logs2' USING
org.elasticsearch.hadoop.pig.EsStorage();

And what ever I put in the LOAD it gives me the same result,even if I put
the path of me desktop

Thx

Le lundi 14 avril 2014 03:11:23 UTC+1, Costin Leau a écrit :

Hi,

That isn't a lot of information so it's hard to figure out what's
actually work - one can only guess. Can you post your
stacktrace/logs and your script pig somewhere - like a gist?

One thing that stands out is that you mention you are using Pig yet your
path points to a Hive warehouse:

/Failed to read data from "/user/hive/warehouse/books"/

I can infer from this that maybe, the issue, is the fact that you are
trying to read a Hive internal file, which Pig
can't understand, leading to the error that you see.

Cheers,

On 4/14/14 1:23 AM, hanine haninne wrote:

Hello ,

I m trying to store data in ES (head) using pig script and it gives me

/Input(s):confused:
/Failed to read data from "/user/hive/warehouse/books"/

/Output(s):confused:
/Failed to produce result in "books/book"/

I ll be so thankful if someone would like help me

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to
elasticsearc...@googlegroups.com <mailto:elasticsearch+
unsubscribe@googlegroups.com>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/979f5688-
bd53-4b76-a97a-5b0359c8be75%40googlegroups.com
<https://groups.google.com/d/msgid/elasticsearch/979f5688-
bd53-4b76-a97a-5b0359c8be75%40googlegroups.com?utm_medium=
email&utm_source=footer>.
For more options, visit https://groups.google.com/d/optout.

--
Costin

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/e766ac9d-788e-4614-80ea-04960fdc257f%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/e766ac9d-788e-4614-80ea-04960fdc257f%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJogdmdSoOPW9OZ9nf0D4-4m0sV9BCbx7YOH0Z3TVBtGCCi4mw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(hanine) #5

Ok ,thank you so much

2014-04-14 9:33 GMT+01:00 Costin Leau costin.leau@gmail.com:

Since you are not specifying the network configuration for an
elasticsearch node, it will default to localhost:9200. This works as long
as you are running Hadoop (Pig, Hive, Cascading, etc...) on the same
machine as Elasticsearch - based on your exception that is unlikely the
case.
Try specifying the es.nodes parameter - see the documentation for more
information.

Additionally, you seem to be using the wrong jar of es-hadoop - in your
script you are registering es-hadoop-1.2.0.jar (which does not support the
pig/hive/cascading functionality) while the stacktrace indicate you are
using es-hadoop-1.3.X.jar.

Make sure you are using es-hadoop-1.3.0.M3.jar (which is released and
available in Maven Central) and no other version. I recommend starting with
the examples in the reference docs, which show to easily load and store
data to/from Elasticsearch.
Once that works, consider extending your script.

Hope this helps,

On Mon, Apr 14, 2014 at 11:23 AM, hanine haninne haninne.5@gmail.comwrote:

Hi,

Here is my log and my script Pig

log file :
Backend error message

java.io.IOException: java.io.IOException: Out of nodes and retries;
caught exception
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:469)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:432)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:412)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:256)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
at
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.io.IOException: Out of nodes and retries; caught exception
at
org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:81)
at
org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:221)
at
org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:205)
at
org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:209)
at org.elasticsearch.hadoop.rest.RestClient.get(RestClient.java:103)
at
org.elasticsearch.hadoop.rest.RestClient.discoverNodes(RestClient.java:85)
at
org.elasticsearch.hadoop.rest.InitializationUtils.discoverNodesIfNeeded(InitializationUtils.java:60)
at
org.elasticsearch.hadoop.mr.EsOutputFormat$ESRecordWriter.init(EsOutputFormat.java:165)
at
org.elasticsearch.hadoop.mr.EsOutputFormat$ESRecordWriter.write(EsOutputFormat.java:147)
at org.elasticsearch.hadoop.pig.EsStorage.putNext(EsStorage.java:188)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
at
org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:586)
at
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:467)
... 11 more
Caused by: java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at java.net.Socket.connect(Socket.java:528)
at java.net.Socket.(Socket.java:425)
at java.net.Socket.(Socket.java:280)
at
org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:79)
at
org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:121)
at
org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:706)
at
org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:386)
at
org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:170)
at
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:396)
at
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:324)
at
org.elasticsearch.hadoop.rest.commonshttp.CommonsHttpTransport.execute(CommonsHttpTransport.java:160)
at
org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:74)
... 25 more

Pig script:
REGISTER /home/hduser/hadoop/lib/elasticsearch-hadoop-1.2.0.jar;
weblogs = LOAD '/user/hive/warehouse/weblogs' USING PigStorage('\t')
AS (client_ip : chararray,
full_request_date : chararray,
day : int,
month : chararray,
month_num : int,
year : int,
hour : int,
minute : int,
second : int,
timezone : chararray,
http_verb : chararray,
uri : chararray,
http_status_code : chararray,
bytes_returned : chararray,
referrer : chararray,
user_agent : chararray
);
weblog_group = GROUP weblogs by (client_ip, year, month_num);
weblog_count = FOREACH weblog_group GENERATE group.client_ip, group.year,
group.month_num, COUNT_STAR(weblogs) as pageviews;

STORE weblog_count INTO 'weblogs2/logs2' USING
org.elasticsearch.hadoop.pig.EsStorage();

And what ever I put in the LOAD it gives me the same result,even if I put
the path of me desktop

Thx

Le lundi 14 avril 2014 03:11:23 UTC+1, Costin Leau a écrit :

Hi,

That isn't a lot of information so it's hard to figure out what's
actually work - one can only guess. Can you post your
stacktrace/logs and your script pig somewhere - like a gist?

One thing that stands out is that you mention you are using Pig yet your
path points to a Hive warehouse:

/Failed to read data from "/user/hive/warehouse/books"/

I can infer from this that maybe, the issue, is the fact that you are
trying to read a Hive internal file, which Pig
can't understand, leading to the error that you see.

Cheers,

On 4/14/14 1:23 AM, hanine haninne wrote:

Hello ,

I m trying to store data in ES (head) using pig script and it gives me

/Input(s):confused:
/Failed to read data from "/user/hive/warehouse/books"/

/Output(s):confused:
/Failed to produce result in "books/book"/

I ll be so thankful if someone would like help me

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to
elasticsearc...@googlegroups.com <mailto:elasticsearch+
unsubscribe@googlegroups.com>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/979f5688-
bd53-4b76-a97a-5b0359c8be75%40googlegroups.com
<https://groups.google.com/d/msgid/elasticsearch/979f5688-
bd53-4b76-a97a-5b0359c8be75%40googlegroups.com?utm_medium=
email&utm_source=footer>.
For more options, visit https://groups.google.com/d/optout.

--
Costin

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/e766ac9d-788e-4614-80ea-04960fdc257f%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/e766ac9d-788e-4614-80ea-04960fdc257f%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAJogdmdSoOPW9OZ9nf0D4-4m0sV9BCbx7YOH0Z3TVBtGCCi4mw%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAJogdmdSoOPW9OZ9nf0D4-4m0sV9BCbx7YOH0Z3TVBtGCCi4mw%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CANXJSR_7p3aUEdJjbU2A3%3DYb%2BW08U4kj%2BEONGh3W7mi30iDqRg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(hanine) #6

Hello ,
I used "elasticsearch-hadoop-1.3.0.M2"
and it given me

Failed Jobs:
JobId Alias Feature Message Outputs
job_201404142111_0008 weblog_count,weblog_group,weblogs
GROUP_BY,COMBINER Message: Job failed! Error - # of failed Reduce Tasks
exceeded allowed limit. FailedCount: 1. LastFailedTask:
task_201404142111_0008_r_000000 weblogs/logs2,

Input(s):
Failed to read data from "/user/hive/warehouse/weblogs"

Output(s):
Failed to produce result in "weblogs/logs2"

Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

I think it s better to know how things works from the beginning,so pls
would u like to tell what I have to do (what should I start with) what
should I do to configure elasticsearch (head) with Hadoop and how can I
work with elasticsearch head

Thank you so much ,really all what u say is so helpful .Thank you

2014-04-14 14:09 GMT+01:00 hanine haninne haninne.5@gmail.com:

Ok ,thank you so much

2014-04-14 9:33 GMT+01:00 Costin Leau costin.leau@gmail.com:

Since you are not specifying the network configuration for an

elasticsearch node, it will default to localhost:9200. This works as long
as you are running Hadoop (Pig, Hive, Cascading, etc...) on the same
machine as Elasticsearch - based on your exception that is unlikely the
case.
Try specifying the es.nodes parameter - see the documentation for more
information.

Additionally, you seem to be using the wrong jar of es-hadoop - in your
script you are registering es-hadoop-1.2.0.jar (which does not support the
pig/hive/cascading functionality) while the stacktrace indicate you are
using es-hadoop-1.3.X.jar.

Make sure you are using es-hadoop-1.3.0.M3.jar (which is released and
available in Maven Central) and no other version. I recommend starting with
the examples in the reference docs, which show to easily load and store
data to/from Elasticsearch.
Once that works, consider extending your script.

Hope this helps,

On Mon, Apr 14, 2014 at 11:23 AM, hanine haninne haninne.5@gmail.comwrote:

Hi,

Here is my log and my script Pig

log file :
Backend error message

java.io.IOException: java.io.IOException: Out of nodes and retries;
caught exception
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:469)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:432)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:412)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:256)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
at
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.io.IOException: Out of nodes and retries; caught
exception
at
org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:81)
at
org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:221)
at
org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:205)
at
org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:209)
at org.elasticsearch.hadoop.rest.RestClient.get(RestClient.java:103)
at
org.elasticsearch.hadoop.rest.RestClient.discoverNodes(RestClient.java:85)
at
org.elasticsearch.hadoop.rest.InitializationUtils.discoverNodesIfNeeded(InitializationUtils.java:60)
at
org.elasticsearch.hadoop.mr.EsOutputFormat$ESRecordWriter.init(EsOutputFormat.java:165)
at
org.elasticsearch.hadoop.mr.EsOutputFormat$ESRecordWriter.write(EsOutputFormat.java:147)
at org.elasticsearch.hadoop.pig.EsStorage.putNext(EsStorage.java:188)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
at
org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:586)
at
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:467)
... 11 more
Caused by: java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at java.net.Socket.connect(Socket.java:528)
at java.net.Socket.(Socket.java:425)
at java.net.Socket.(Socket.java:280)
at
org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:79)
at
org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:121)
at
org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:706)
at
org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:386)
at
org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:170)
at
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:396)
at
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:324)
at
org.elasticsearch.hadoop.rest.commonshttp.CommonsHttpTransport.execute(CommonsHttpTransport.java:160)
at
org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:74)
... 25 more

Pig script:
REGISTER /home/hduser/hadoop/lib/elasticsearch-hadoop-1.2.0.jar;
weblogs = LOAD '/user/hive/warehouse/weblogs' USING PigStorage('\t')
AS (client_ip : chararray,
full_request_date : chararray,
day : int,
month : chararray,
month_num : int,
year : int,
hour : int,
minute : int,
second : int,
timezone : chararray,
http_verb : chararray,
uri : chararray,
http_status_code : chararray,
bytes_returned : chararray,
referrer : chararray,
user_agent : chararray
);
weblog_group = GROUP weblogs by (client_ip, year, month_num);
weblog_count = FOREACH weblog_group GENERATE group.client_ip,
group.year, group.month_num, COUNT_STAR(weblogs) as pageviews;

STORE weblog_count INTO 'weblogs2/logs2' USING
org.elasticsearch.hadoop.pig.EsStorage();

And what ever I put in the LOAD it gives me the same result,even if I
put the path of me desktop

Thx

Le lundi 14 avril 2014 03:11:23 UTC+1, Costin Leau a écrit :

Hi,

That isn't a lot of information so it's hard to figure out what's
actually work - one can only guess. Can you post your
stacktrace/logs and your script pig somewhere - like a gist?

One thing that stands out is that you mention you are using Pig yet
your path points to a Hive warehouse:

/Failed to read data from "/user/hive/warehouse/books"/

I can infer from this that maybe, the issue, is the fact that you are
trying to read a Hive internal file, which Pig
can't understand, leading to the error that you see.

Cheers,

On 4/14/14 1:23 AM, hanine haninne wrote:

Hello ,

I m trying to store data in ES (head) using pig script and it gives
me

/Input(s):confused:
/Failed to read data from "/user/hive/warehouse/books"/

/Output(s):confused:
/Failed to produce result in "books/book"/

I ll be so thankful if someone would like help me

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to
elasticsearc...@googlegroups.com <mailto:elasticsearch+
unsubscribe@googlegroups.com>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/979f5688-
bd53-4b76-a97a-5b0359c8be75%40googlegroups.com
<https://groups.google.com/d/msgid/elasticsearch/979f5688-
bd53-4b76-a97a-5b0359c8be75%40googlegroups.com?utm_medium=
email&utm_source=footer>.
For more options, visit https://groups.google.com/d/optout.

--
Costin

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/e766ac9d-788e-4614-80ea-04960fdc257f%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/e766ac9d-788e-4614-80ea-04960fdc257f%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAJogdmdSoOPW9OZ9nf0D4-4m0sV9BCbx7YOH0Z3TVBtGCCi4mw%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAJogdmdSoOPW9OZ9nf0D4-4m0sV9BCbx7YOH0Z3TVBtGCCi4mw%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CANXJSR9yKurOp37cr2OLPfF6%2BgE9K7A3DaTi0ecnVWiFJHK96w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(hanine) #7

Hello ,
I used "elasticsearch-hadoop-1.3.0.M2"
and it given me

Failed Jobs:
JobId Alias Feature Message Outputs
job_201404142111_0008 weblog_count,weblog_group,weblogs
GROUP_BY,COMBINER Message: Job failed! Error - # of failed Reduce Tasks
exceeded allowed limit. FailedCount: 1. LastFailedTask:
task_201404142111_0008_r_000000 weblogs/logs2,

Input(s):
Failed to read data from "/user/hive/warehouse/weblogs"

Output(s):
Failed to produce result in "weblogs/logs2"

Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

I think it s better to know how things works from the beginning,so pls
would u like to tell what I have to do (what should I start with) what
should I do to configure elasticsearch (head) with Hadoop and how can I
work with elasticsearch head

Thank you so much ,really all what u say is so helpful .Thank you

Le lundi 14 avril 2014 09:33:12 UTC+1, Costin Leau a écrit :

Since you are not specifying the network configuration for an
elasticsearch node, it will default to localhost:9200. This works as long
as you are running Hadoop (Pig, Hive, Cascading, etc...) on the same
machine as Elasticsearch - based on your exception that is unlikely the
case.
Try specifying the es.nodes parameter - see the documentation for more
information.

Additionally, you seem to be using the wrong jar of es-hadoop - in your
script you are registering es-hadoop-1.2.0.jar (which does not support the
pig/hive/cascading functionality) while the stacktrace indicate you are
using es-hadoop-1.3.X.jar.

Make sure you are using es-hadoop-1.3.0.M3.jar (which is released and
available in Maven Central) and no other version. I recommend starting with
the examples in the reference docs, which show to easily load and store
data to/from Elasticsearch.
Once that works, consider extending your script.

Hope this helps,

On Mon, Apr 14, 2014 at 11:23 AM, hanine haninne <hani...@gmail.com<javascript:>

wrote:

Hi,

Here is my log and my script Pig

log file :
Backend error message

java.io.IOException: java.io.IOException: Out of nodes and retries;
caught exception
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:469)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:432)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:412)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:256)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
at
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.io.IOException: Out of nodes and retries; caught exception
at
org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:81)
at
org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:221)
at
org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:205)
at
org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:209)
at org.elasticsearch.hadoop.rest.RestClient.get(RestClient.java:103)
at
org.elasticsearch.hadoop.rest.RestClient.discoverNodes(RestClient.java:85)
at
org.elasticsearch.hadoop.rest.InitializationUtils.discoverNodesIfNeeded(InitializationUtils.java:60)
at
org.elasticsearch.hadoop.mr.EsOutputFormat$ESRecordWriter.init(EsOutputFormat.java:165)
at
org.elasticsearch.hadoop.mr.EsOutputFormat$ESRecordWriter.write(EsOutputFormat.java:147)
at org.elasticsearch.hadoop.pig.EsStorage.putNext(EsStorage.java:188)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
at
org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:586)
at
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:467)
... 11 more
Caused by: java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at java.net.Socket.connect(Socket.java:528)
at java.net.Socket.(Socket.java:425)
at java.net.Socket.(Socket.java:280)
at
org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:79)
at
org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:121)
at
org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:706)
at
org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:386)
at
org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:170)
at
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:396)
at
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:324)
at
org.elasticsearch.hadoop.rest.commonshttp.CommonsHttpTransport.execute(CommonsHttpTransport.java:160)
at
org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:74)
... 25 more

Pig script:
REGISTER /home/hduser/hadoop/lib/elasticsearch-hadoop-1.2.0.jar;
weblogs = LOAD '/user/hive/warehouse/weblogs' USING PigStorage('\t')
AS (client_ip : chararray,
full_request_date : chararray,
day : int,
month : chararray,
month_num : int,
year : int,
hour : int,
minute : int,
second : int,
timezone : chararray,
http_verb : chararray,
uri : chararray,
http_status_code : chararray,
bytes_returned : chararray,
referrer : chararray,
user_agent : chararray
);
weblog_group = GROUP weblogs by (client_ip, year, month_num);
weblog_count = FOREACH weblog_group GENERATE group.client_ip, group.year,
group.month_num, COUNT_STAR(weblogs) as pageviews;

STORE weblog_count INTO 'weblogs2/logs2' USING
org.elasticsearch.hadoop.pig.EsStorage();

And what ever I put in the LOAD it gives me the same result,even if I put
the path of me desktop

Thx

Le lundi 14 avril 2014 03:11:23 UTC+1, Costin Leau a écrit :

Hi,

That isn't a lot of information so it's hard to figure out what's
actually work - one can only guess. Can you post your
stacktrace/logs and your script pig somewhere - like a gist?

One thing that stands out is that you mention you are using Pig yet your
path points to a Hive warehouse:

/Failed to read data from "/user/hive/warehouse/books"/

I can infer from this that maybe, the issue, is the fact that you are
trying to read a Hive internal file, which Pig
can't understand, leading to the error that you see.

Cheers,

On 4/14/14 1:23 AM, hanine haninne wrote:

Hello ,

I m trying to store data in ES (head) using pig script and it gives me

/Input(s):confused:
/Failed to read data from "/user/hive/warehouse/books"/

/Output(s):confused:
/Failed to produce result in "books/book"/

I ll be so thankful if someone would like help me

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to
elasticsearc...@googlegroups.com <mailto:elasticsearch+
unsubscribe@googlegroups.com>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/979f5688-
bd53-4b76-a97a-5b0359c8be75%40googlegroups.com
<https://groups.google.com/d/msgid/elasticsearch/979f5688-
bd53-4b76-a97a-5b0359c8be75%40googlegroups.com?utm_medium=
email&utm_source=footer>.
For more options, visit https://groups.google.com/d/optout.

--
Costin

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/e766ac9d-788e-4614-80ea-04960fdc257f%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/e766ac9d-788e-4614-80ea-04960fdc257f%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/72350b3d-dc2f-4f5a-b066-cfd0b2241545%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Costin Leau) #8

Glad to hear it but know that the latest release is 1.3.0 M3. Simply check the official project page [1] and you get all
the info [2], including the download setup from Maven, for both stable and dev/snapshot releases [3]

[1] http://www.elasticsearch.org/overview/hadoop/
[2] http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/index.html
[3] http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/install.html

On 4/15/14 12:10 PM, hanine haninne wrote:

Hello ,
I used "elasticsearch-hadoop-1.3.0.M2"
and it given me

Failed Jobs:
JobId Alias Feature Message Outputs
job_201404142111_0008 weblog_count,weblog_group,weblogs GROUP_BY,COMBINER Message: Job failed! Error - # of
failed Reduce Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: task_201404142111_0008_r_000000
weblogs/logs2,

Input(s):
Failed to read data from "/user/hive/warehouse/weblogs"

Output(s):
Failed to produce result in "weblogs/logs2"

Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

I think it s better to know how things works from the beginning,so pls would u like to tell what I have to do (what
should I start with) what should I do to configure elasticsearch (head) with Hadoop and how can I work with
elasticsearch head

Thank you so much ,really all what u say is so helpful .Thank you

Le lundi 14 avril 2014 09:33:12 UTC+1, Costin Leau a écrit :

Since you are not specifying the network configuration for an elasticsearch node, it will default to localhost:9200.
This works as long as you are running Hadoop (Pig, Hive, Cascading, etc...) on the same machine as Elasticsearch -
based on your exception that is unlikely the case.
Try specifying the `es.nodes` parameter - see the documentation for more information.

Additionally, you seem to be using the wrong jar of es-hadoop - in your script you are registering
es-hadoop-1.2.0.jar (which does not support the pig/hive/cascading functionality) while the stacktrace indicate you
are using es-hadoop-1.3.X.jar.

Make sure you are using es-hadoop-1.3.0.M3.jar (which is released and available in Maven Central) and no other
version. I recommend starting with the examples in the reference docs, which show to easily load and store data
to/from Elasticsearch.
Once that works, consider extending your script.

Hope this helps,


On Mon, Apr 14, 2014 at 11:23 AM, hanine haninne <hani...@gmail.com <javascript:>> wrote:

    Hi,

    Here is my log and my script Pig

    log file :
    Backend error message
    ---------------------
    java.io.IOException: java.io.IOException: Out of nodes and retries; caught exception
         at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:469)
         at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:432)
         at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:412)
         at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:256)
         at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
         at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
         at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
         at java.security.AccessController.doPrivileged(Native Method)
         at javax.security.auth.Subject.doAs(Subject.java:415)
         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
         at org.apache.hadoop.mapred.Child.main(Child.java:249)
    Caused by: java.io.IOException: Out of nodes and retries; caught exception
         at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:81)
         at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:221)
         at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:205)
         at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:209)
         at org.elasticsearch.hadoop.rest.RestClient.get(RestClient.java:103)
         at org.elasticsearch.hadoop.rest.RestClient.discoverNodes(RestClient.java:85)
         at org.elasticsearch.hadoop.rest.InitializationUtils.discoverNodesIfNeeded(InitializationUtils.java:60)
         at org.elasticsearch.hadoop.mr.EsOutputFormat$ESRecordWriter.init(EsOutputFormat.java:165)
         at org.elasticsearch.hadoop.mr.EsOutputFormat$ESRecordWriter.write(EsOutputFormat.java:147)
         at org.elasticsearch.hadoop.pig.EsStorage.putNext(EsStorage.java:188)
         at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
         at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
         at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:586)
         at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
         at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:467)
         ... 11 more
    Caused by: java.net.ConnectException: Connection refused
         at java.net.PlainSocketImpl.socketConnect(Native Method)
         at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
         at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
         at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
         at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
         at java.net.Socket.connect(Socket.java:579)
         at java.net.Socket.connect(Socket.java:528)
         at java.net.Socket.<init>(Socket.java:425)
         at java.net.Socket.<init>(Socket.java:280)
         at
    org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:79)
         at
    org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:121)
         at org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:706)
         at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:386)
         at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:170)
         at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:396)
         at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:324)
         at org.elasticsearch.hadoop.rest.commonshttp.CommonsHttpTransport.execute(CommonsHttpTransport.java:160)
         at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:74)
         ... 25 more

    Pig script:
    REGISTER /home/hduser/hadoop/lib/elasticsearch-hadoop-1.2.0.jar;
    weblogs = LOAD '/user/hive/warehouse/weblogs' USING PigStorage('\t')
    AS (client_ip : chararray,
    full_request_date : chararray,
    day : int,
    month : chararray,
    month_num : int,
    year : int,
    hour : int,
    minute : int,
    second : int,
    timezone : chararray,
    http_verb : chararray,
    uri : chararray,
    http_status_code : chararray,
    bytes_returned : chararray,
    referrer : chararray,
    user_agent : chararray
    );
    weblog_group = GROUP weblogs by (client_ip, year, month_num);
    weblog_count = FOREACH weblog_group GENERATE group.client_ip, group.year, group.month_num,  COUNT_STAR(weblogs)
    as pageviews;

    STORE weblog_count INTO 'weblogs2/logs2' USING org.elasticsearch.hadoop.pig.EsStorage();

    And what ever I put in the LOAD it gives me the same result,even if I put the path of me desktop

    Thx

    Le lundi 14 avril 2014 03:11:23 UTC+1, Costin Leau a écrit :

        Hi,

        That isn't a lot of information so it's hard to figure out what's actually work - one can only guess. Can
        you post your
        stacktrace/logs and your script pig somewhere - like a gist?

        One thing that stands out is that you mention you are using Pig yet your path points to a Hive warehouse:
        > /Failed to read data from "/user/hive/warehouse/books"/

        I can infer from this that maybe, the issue, is the fact that you are trying to read a Hive internal file,
        which Pig
        can't understand, leading to the error that you see.

        Cheers,


        On 4/14/14 1:23 AM, hanine haninne wrote:
        > Hello ,
        >
        > I m trying to store data in ES (head) using pig script and it gives me
        >
        > /Input(s):/
        > /Failed to read data from "/user/hive/warehouse/books"/
        >
        > /Output(s):/
        > /Failed to produce result in "books/book"/
        >
        > I ll be so thankful if someone would like help me
        >
        > --
        > You received this message because you are subscribed to the Google Groups "elasticsearch" group.
        > To unsubscribe from this group and stop receiving emails from it, send an email to
        >elasticsearc...@__googlegroups.com <mailto:elasticsearch+__unsubscribe@googlegroups.com>.
        > To view this discussion on the web visit
        >https://groups.google.com/d/__msgid/elasticsearch/979f5688-__bd53-4b76-a97a-5b0359c8be75%__40googlegroups.com <https://groups.google.com/d/msgid/elasticsearch/979f5688-bd53-4b76-a97a-5b0359c8be75%40googlegroups.com>

        > <https://groups.google.com/d/__msgid/elasticsearch/979f5688-__bd53-4b76-a97a-5b0359c8be75%__40googlegroups.com?utm_medium=__email&utm_source=footer
        <https://groups.google.com/d/msgid/elasticsearch/979f5688-bd53-4b76-a97a-5b0359c8be75%40googlegroups.com?utm_medium=email&utm_source=footer>>.

        > For more options, visithttps://groups.google.com/d/__optout <https://groups.google.com/d/optout>.

        --
        Costin

    --
    You received this message because you are subscribed to the Google Groups "elasticsearch" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to
    elasticsearc...@googlegroups.com <javascript:>.
    To view this discussion on the web visit
    https://groups.google.com/d/msgid/elasticsearch/e766ac9d-788e-4614-80ea-04960fdc257f%40googlegroups.com
    <https://groups.google.com/d/msgid/elasticsearch/e766ac9d-788e-4614-80ea-04960fdc257f%40googlegroups.com?utm_medium=email&utm_source=footer>.

    For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
elasticsearch+unsubscribe@googlegroups.com mailto:elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/72350b3d-dc2f-4f5a-b066-cfd0b2241545%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/72350b3d-dc2f-4f5a-b066-cfd0b2241545%40googlegroups.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout.

--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/534D0E8C.3050704%40gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #9