Need help for Hadoop and ES integration


(Chetana) #1

I have downloaded elasticsearch-hadoop-1.2.0.jar from github and trying to
search. The code to search looks like


Configuration conf = new Configuration();
conf.set("es.nodes", "localhost");
conf.set("es.port", "9200");
conf.set("es.resource", "test_index/test_mapping");
conf.set("es.query", "{"query":{"term":{"test.field1":"test"}}}");
Job job = Job.getInstance(conf);
job.setJobName(ESIndexContoller.class.getSimpleName());
job.setJarByClass(ESIndexContoller.class);

job.setInputFormatClass(EsInputFormat.class);
job.setOutputFormatClass(EsOutputFormat.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(MapWritable.class);
job.setSpeculativeExecution(false);
job.waitForCompletion(true);


I have hadoop 2.2 from hortonworks and ES 0.90.2. But when I run 'hadoop
jar' command it throws 'IllegalStateException: Job in state DEFINE instead
of RUNNING'

I believe one needs to create custom Mapper and Reducer. If so, can someone
tell me how the code should look like inside map() and reduce()

Thanks,

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c70d3f45-27a7-4f38-9615-7cce19c956c4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Costin Leau) #2

You are using the wrong version of elasticserach-hadoop. Double check the docs [1].

P.S. I recommend upgrading to the latest ES 0.90 (or 1.1.0 even). And if possible, using the latest snapshot of master -
downloadable through maven [2] (note there's no need for using the yarn classifier in that case, there's only one jar
that works across both Hadoop 1 and 2).

Cheers,

[1] http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/index.html
[2] http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/install.html

On 4/3/14 12:36 PM, Chetana wrote:

I have downloaded elasticsearch-hadoop-1.2.0.jar from github and trying to search. The code to search looks like


Configuration conf = new Configuration();
conf.set("es.nodes", "localhost");
conf.set("es.port", "9200");
conf.set("es.resource", "test_index/test_mapping");
conf.set("es.query", "{"query":{"term":{"test.field1":"test"}}}");
Job job = Job.getInstance(conf);
job.setJobName(ESIndexContoller.class.getSimpleName());
job.setJarByClass(ESIndexContoller.class);

job.setInputFormatClass(EsInputFormat.class);
job.setOutputFormatClass(EsOutputFormat.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(MapWritable.class);
job.setSpeculativeExecution(false);
job.waitForCompletion(true);


I have hadoop 2.2 from hortonworks and ES 0.90.2. But when I run 'hadoop jar' command it throws 'IllegalStateException:
Job in state DEFINE instead of RUNNING'
I believe one needs to create custom Mapper and Reducer. If so, can someone tell me how the code should look like inside
map() and reduce()
Thanks,

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
elasticsearch+unsubscribe@googlegroups.com mailto:elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/c70d3f45-27a7-4f38-9615-7cce19c956c4%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/c70d3f45-27a7-4f38-9615-7cce19c956c4%40googlegroups.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout.
--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/533D2DD7.4030802%40gmail.com.
For more options, visit https://groups.google.com/d/optout.


(M_20) #3

Hi Chetana,

Could you please share with me a java sample code of Map/reduce on
Elastcisearch-Hadoop?

Regards

On Thursday, April 3, 2014 4:36:24 AM UTC-5, Chetana wrote:

I have downloaded elasticsearch-hadoop-1.2.0.jar from github and trying to
search. The code to search looks like


Configuration conf = new Configuration();
conf.set("es.nodes", "localhost");
conf.set("es.port", "9200");
conf.set("es.resource", "test_index/test_mapping");
conf.set("es.query",
"{"query":{"term":{"test.field1":"test"}}}");
Job job = Job.getInstance(conf);
job.setJobName(ESIndexContoller.class.getSimpleName());
job.setJarByClass(ESIndexContoller.class);

job.setInputFormatClass(EsInputFormat.class);
job.setOutputFormatClass(EsOutputFormat.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(MapWritable.class);
job.setSpeculativeExecution(false);
job.waitForCompletion(true);


I have hadoop 2.2 from hortonworks and ES 0.90.2. But when I run 'hadoop
jar' command it throws 'IllegalStateException: Job in state DEFINE instead
of RUNNING'

I believe one needs to create custom Mapper and Reducer. If so, can
someone tell me how the code should look like inside map() and reduce()

Thanks,

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c54ca61b-26d8-4cad-91ee-80552ae13116%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Costin Leau) #4

M_20, I've already replied to your initial query on where you can find some
example - the official docs:
http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/mapreduce.html
There's also google which points to other resources outside Elasticsearch.

On Fri, Jul 25, 2014 at 11:10 PM, M_20 rastegar.83@gmail.com wrote:

Hi Chetana,

Could you please share with me a java sample code of Map/reduce on
Elastcisearch-Hadoop?

Regards

On Thursday, April 3, 2014 4:36:24 AM UTC-5, Chetana wrote:

I have downloaded elasticsearch-hadoop-1.2.0.jar from github and trying
to search. The code to search looks like


Configuration conf = new Configuration();
conf.set("es.nodes", "localhost");
conf.set("es.port", "9200");
conf.set("es.resource", "test_index/test_mapping");
conf.set("es.query", "{"query":{"term":{"test.
field1":"test"}}}");
Job job = Job.getInstance(conf);
job.setJobName(ESIndexContoller.class.getSimpleName());
job.setJarByClass(ESIndexContoller.class);

job.setInputFormatClass(EsInputFormat.class);
job.setOutputFormatClass(EsOutputFormat.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(MapWritable.class);
job.setSpeculativeExecution(false);
job.waitForCompletion(true);


I have hadoop 2.2 from hortonworks and ES 0.90.2. But when I run 'hadoop
jar' command it throws 'IllegalStateException: Job in state DEFINE instead
of RUNNING'

I believe one needs to create custom Mapper and Reducer. If so, can
someone tell me how the code should look like inside map() and reduce()

Thanks,

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/c54ca61b-26d8-4cad-91ee-80552ae13116%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/c54ca61b-26d8-4cad-91ee-80552ae13116%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJogdmefELVCohXiJQJOJFyPHfEw%3DPYSLivA9ULrKHd6TxGC4A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(M_20) #5

Costin,

Thank you for your reply. I've read the official docs before. But it seems
I am missing something. So, I wanted to see a complete example to make sure
I am understanding ES-hadoop correctly.
For example, in the official docs, it talks about writing data to ES, and
this is the mapper

/////////////////////////////////////

public void map(Object key, Object value, OutputCollector output,
Reporter reporter) throws IOException {
// create the MapWritable object
MapWritable doc = new MapWritable();
...
// write the result to the output collector
// one can pass whatever value to the key; EsOutputFormat ignores it
output.collect(NullWritable.get(), map);
}}
/////////////////////////////////////

Could you please tell me what's map in
output.collect(NullWritable.get(), map);

Also, could you please tell me a little bit about "MapWritable doc"? How its format should be?

Regards

On Friday, July 25, 2014 3:25:21 PM UTC-5, Costin Leau wrote:

M_20, I've already replied to your initial query on where you can find
some example - the official docs:

http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/mapreduce.html
There's also google which points to other resources outside Elasticsearch.

On Fri, Jul 25, 2014 at 11:10 PM, M_20 <raste...@gmail.com <javascript:>>
wrote:

Hi Chetana,

Could you please share with me a java sample code of Map/reduce on
Elastcisearch-Hadoop?

Regards

On Thursday, April 3, 2014 4:36:24 AM UTC-5, Chetana wrote:

I have downloaded elasticsearch-hadoop-1.2.0.jar from github and trying
to search. The code to search looks like


Configuration conf = new Configuration();
conf.set("es.nodes", "localhost");
conf.set("es.port", "9200");
conf.set("es.resource", "test_index/test_mapping");
conf.set("es.query", "{"query":{"term":{"test.
field1":"test"}}}");
Job job = Job.getInstance(conf);
job.setJobName(ESIndexContoller.class.getSimpleName());
job.setJarByClass(ESIndexContoller.class);

job.setInputFormatClass(EsInputFormat.class);
job.setOutputFormatClass(EsOutputFormat.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(MapWritable.class);
job.setSpeculativeExecution(false);
job.waitForCompletion(true);


I have hadoop 2.2 from hortonworks and ES 0.90.2. But when I run 'hadoop
jar' command it throws 'IllegalStateException: Job in state DEFINE instead
of RUNNING'

I believe one needs to create custom Mapper and Reducer. If so, can
someone tell me how the code should look like inside map() and reduce()

Thanks,

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/c54ca61b-26d8-4cad-91ee-80552ae13116%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/c54ca61b-26d8-4cad-91ee-80552ae13116%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/377ea0b8-ea98-4a38-9f83-a58849135667%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Costin Leau) #6

Quoting from the same docs again:
"EsOutputFormat expects a Map<Writable, Writable> representing a document
value that is converted interally into a JSON document and indexed in
Elasticsearch. Hadoop OutputFormat requires implementations to expect a key
and a value however, since for Elasticsearch only the document (that is the
value) is necessary, EsOutputFormat ignores the key."

As for MapWritable, it's Hadoop Map implementation - other than that, it's
just a map. It's not something es-hadoop provides but rather Hadoop. I
recommend spending some time getting familiar with Hadoop since es-hadoop
leverages many of its classes and concepts; after all it's a dedicated
'connector' to Elasticsearch for Hadoop.

P.S. In the future, please don't hijack old threads but rather start your
own - it's easier for everyone.

On Sat, Jul 26, 2014 at 12:19 AM, M_20 rastegar.83@gmail.com wrote:

Costin,

Thank you for your reply. I've read the official docs before. But it seems
I am missing something. So, I wanted to see a complete example to make sure
I am understanding ES-hadoop correctly.
For example, in the official docs, it talks about writing data to ES, and
this is the mapper

/////////////////////////////////////

public void map(Object key, Object value, OutputCollector output,
Reporter reporter) throws IOException {
// create the MapWritable object
MapWritable doc = new MapWritable();
...
// write the result to the output collector
// one can pass whatever value to the key; EsOutputFormat ignores it
output.collect(NullWritable.get(), map);
}}
/////////////////////////////////////

Could you please tell me what's map in
output.collect(NullWritable.get(), map);

Also, could you please tell me a little bit about "MapWritable doc"? How its format should be?

Regards

On Friday, July 25, 2014 3:25:21 PM UTC-5, Costin Leau wrote:

M_20, I've already replied to your initial query on where you can find
some example - the official docs:
http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/
current/mapreduce.html
There's also google which points to other resources outside Elasticsearch.

On Fri, Jul 25, 2014 at 11:10 PM, M_20 raste...@gmail.com wrote:

Hi Chetana,

Could you please share with me a java sample code of Map/reduce on
Elastcisearch-Hadoop?

Regards

On Thursday, April 3, 2014 4:36:24 AM UTC-5, Chetana wrote:

I have downloaded elasticsearch-hadoop-1.2.0.jar from github and trying
to search. The code to search looks like


Configuration conf = new Configuration();
conf.set("es.nodes", "localhost");
conf.set("es.port", "9200");
conf.set("es.resource", "test_index/test_mapping");
conf.set("es.query", "{"query":{"term":{"test.
field1":"test"}}}");
Job job = Job.getInstance(conf);
job.setJobName(ESIndexContoller.class.getSimpleName());
job.setJarByClass(ESIndexContoller.class);

job.setInputFormatClass(EsInputFormat.class);
job.setOutputFormatClass(EsOutputFormat.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(MapWritable.class);
job.setSpeculativeExecution(false);
job.waitForCompletion(true);


I have hadoop 2.2 from hortonworks and ES 0.90.2. But when I run
'hadoop jar' command it throws 'IllegalStateException: Job in state DEFINE
instead of RUNNING'

I believe one needs to create custom Mapper and Reducer. If so, can
someone tell me how the code should look like inside map() and reduce()

Thanks,

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/c54ca61b-26d8-4cad-91ee-80552ae13116%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/c54ca61b-26d8-4cad-91ee-80552ae13116%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/377ea0b8-ea98-4a38-9f83-a58849135667%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/377ea0b8-ea98-4a38-9f83-a58849135667%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJogdmd-4CZSPdrRxr%3DS_J7SgvUP7%2BusqeN0XMASh18CBpc2Mw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #7