HOWTO for testing loading data from HDFS to ES?


(bob.webman) #1

Hi Guys,
I have ES installed on 16 nodes in my Cloudera Hadoop cluster. All looks
good from an ES point of view.

I now want to test a very simple data load from hdfs to ES but am
struggling. I am using PIG and have elasticsearch-hadoop installed.

I want to load a single file, it is a pipe-delimited text file and is on
the hdfs:

$ hdfs dfs -ls /logfiles/20140820
Found 1 items
-rw-r--r-- 3 bob supergroup 2015426946 2014-08-20 06:45 /logfiles/20140820

Can someone help me with a really simple test using PIG? When I try the
following I get:

grunt> DEFINE EsStorage org.elasticsearch.hadoop.pig.EsStorage();
grunt> data = load '/logfiles/20140820' using PigStorage('\n')
grunt> B = foreach data generate $0 as id;
grunt> STORE B INTO 'esTEST' using EsStorage('es.http.timeout = 5m');

Failed!

Failed Jobs:
JobId Alias Feature Message Outputs
job_201408191741_0008 B,data MAP_ONLY Message: Job failed!
esTEST,

Input(s):
Failed to read data from "/logfiles/20140820"

Output(s):
Failed to produce result in "esTEST"

Can someone help a noob out with some simple PIG just to check I have it
working?
Thanks
Paul

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/11125bff-de59-4cb2-a42c-3f1f37a39f70%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(bob.webman) #2

In case anyone wants to try this next time:

You need to have the output in the format of [index]/[type]!

So change:

STORE B INTO 'esTEST' using EsStorage('es.http.timeout = 5m');

to

STORE B INTO 'demo/esTEST' using EsStorage('es.http.timeout = 5m');

All working now
------------------------
On Wednesday, August 20, 2014 12:25:32 PM UTC+1, bob.w...@gmail.com wrote:

Hi Guys,
I have ES installed on 16 nodes in my Cloudera Hadoop cluster. All looks
good from an ES point of view.

I now want to test a very simple data load from hdfs to ES but am
struggling. I am using PIG and have elasticsearch-hadoop installed.

I want to load a single file, it is a pipe-delimited text file and is on
the hdfs:

$ hdfs dfs -ls /logfiles/20140820
Found 1 items
-rw-r--r-- 3 bob supergroup 2015426946 2014-08-20 06:45
/logfiles/20140820

Can someone help me with a really simple test using PIG? When I try the
following I get:

grunt> DEFINE EsStorage org.elasticsearch.hadoop.pig.EsStorage();
grunt> data = load '/logfiles/20140820' using PigStorage('\n')
grunt> B = foreach data generate $0 as id;
grunt> STORE B INTO 'esTEST' using EsStorage('es.http.timeout = 5m');

Failed!

Failed Jobs:
JobId Alias Feature Message Outputs
job_201408191741_0008 B,data MAP_ONLY Message: Job failed!
esTEST,

Input(s):
Failed to read data from "/logfiles/20140820"

Output(s):
Failed to produce result in "esTEST"

Can someone help a noob out with some simple PIG just to check I have it
working?
Thanks
Paul

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/fb2e7793-ce99-4c9b-8762-a5c29f853c2d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #3