[hadoop] Pipelining Hadoop/Spark with ElasticSearch

Han_JU · October 24, 2013, 4:53pm

Hi,

I'm trying to write hadoop aggregation results to ES.
Say I've K, V for key and value classes respectively. According to
elasticsearch-hadoop api/blog, the key is ignored and the value should be a
Map<K, V>.
I'm a little bit confused here: do I need an extra map job to convert my
(K, V) to (Null, Map<K, V>) ?
Is there any complete examples of using hadoop and ES together?

Thanks.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

mattweber · October 24, 2013, 5:04pm

Have you tried the elasticsearch-hadoop plugin? There is good
documentation on the website.

Thanks,
Matt Weber

On Thu, Oct 24, 2013 at 9:53 AM, Han JU ju.han.felix@gmail.com wrote:

Hi,

I'm trying to write hadoop aggregation results to ES.
Say I've K, V for key and value classes respectively. According to
elasticsearch-hadoop api/blog, the key is ignored and the value should be a
Map<K, V>.
I'm a little bit confused here: do I need an extra map job to convert my
(K, V) to (Null, Map<K, V>) ?
Is there any complete examples of using hadoop and ES together?

Thanks.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

costin · October 24, 2013, 5:05pm

Hi,

I replied on IRC but you left. See the docs here [1]. The value represents your document and since it might contain
multiple fields, ESOuputFormat expects a Map (MapWritable) which contains the actual document. Say your doc is something
like { foo: 123 } then your map would be [Text("foo"):new LongWritable(123)].

The docs provides more information about the Writable types supported (basically all of them) and their equivalent ES types.

[1] Elasticsearch Platform — Find real-time answers at scale | Elastic

On 24/10/2013 7:53 PM, Han JU wrote:

Hi,

I'm trying to write hadoop aggregation results to ES.
Say I've K, V for key and value classes respectively. According to elasticsearch-hadoop api/blog, the key is ignored and
the value should be a Map<K, V>.
I'm a little bit confused here: do I need an extra map job to convert my (K, V) to (Null, Map<K, V>) ?
Is there any complete examples of using hadoop and ES together?

Thanks.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Han_JU · October 25, 2013, 1:26pm

Thanks. Seems like I misunderstand something.

Now I managed to push documents to ES, and I'd like to know if these are
supported by current version of elasticsearch-binding:

specifying id for index. Now the "_id" for the documents pushed are auto
generated
the update api

Thanks.

在 2013年10月24日星期四UTC+2下午7时05分31秒，Costin Leau写道：

Hi,

I replied on IRC but you left. See the docs here [1]. The value represents
your document and since it might contain
multiple fields, ESOuputFormat expects a Map (MapWritable) which contains
the actual document. Say your doc is something
like { foo: 123 } then your map would be [Text("foo"):new
LongWritable(123)].

The docs provides more information about the Writable types supported
(basically all of them) and their equivalent ES types.

[1]
Elasticsearch Platform — Find real-time answers at scale | Elastic

On 24/10/2013 7:53 PM, Han JU wrote:

Hi,

I'm trying to write hadoop aggregation results to ES.
Say I've K, V for key and value classes respectively. According to
elasticsearch-hadoop api/blog, the key is ignored and
the value should be a Map<K, V>.
I'm a little bit confused here: do I need an extra map job to convert my
(K, V) to (Null, Map<K, V>) ?
Is there any complete examples of using hadoop and ES together?

Thanks.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to
elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

costin · October 25, 2013, 2:10pm

On 25/10/2013 4:26 PM, Han JU wrote:

Thanks. Seems like I misunderstand something.

Now I managed to push documents to ES, and I'd like to know if these are supported by current version of
elasticsearch-binding:

I assume you mean elasticsearch-hadoop.

specifying id for index. Now the "_id" for the documents pushed are auto generated

the update api

This is being currently worked on and we should have something in trunk by next week.

Thanks.

在 2013年10月24日星期四UTC+2下午7时05分31秒，Costin Leau写道：

Hi,

I replied on IRC but you left. See the docs here [1]. The value represents your document and since it might contain
multiple fields, ESOuputFormat expects a Map (MapWritable) which contains the actual document. Say your doc is
something
like { foo: 123 } then your map would be [Text("foo"):new LongWritable(123)].

The docs provides more information about the Writable types supported (basically all of them) and their equivalent
ES types.

[1] http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/mapreduce.html
<http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/mapreduce.html>

On 24/10/2013 7:53 PM, Han JU wrote:
> Hi,
>
> I'm trying to write hadoop aggregation results to ES.
> Say I've K, V for key and value classes respectively. According to elasticsearch-hadoop api/blog, the key is ignored and
> the value should be a Map<K, V>.
> I'm a little bit confused here: do I need an extra map job to convert my (K, V) to (Null, Map<K, V>) ?
> Is there any complete examples of using hadoop and ES together?
>
> Thanks.
>
> --
> You received this message because you are subscribed to the Google Groups "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
>elasticsearc...@googlegroups.com <javascript:>.
> For more options, visithttps://groups.google.com/groups/opt_out <https://groups.google.com/groups/opt_out>.

--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

aarthi1890 · September 10, 2014, 6:01am

Hi

Wanted to know if the auto generated id has been committed.

Thanks
Aarthi

On Friday, 25 October 2013 19:40:05 UTC+5:30, Costin Leau wrote:

On 25/10/2013 4:26 PM, Han JU wrote:

Thanks. Seems like I misunderstand something.

Now I managed to push documents to ES, and I'd like to know if these are
supported by current version of
elasticsearch-binding:

I assume you mean elasticsearch-hadoop.

specifying id for index. Now the "_id" for the documents pushed are
auto generated

the update api

This is being currently worked on and we should have something in trunk by
next week.
Thanks.

在 2013年10月24日星期四UTC+2下午7时05分31秒，Costin Leau写道：
Hi, 

I replied on IRC but you left. See the docs here [1]. The value 
represents your document and since it might contain
multiple fields, ESOuputFormat expects a Map (MapWritable) which 
contains the actual document. Say your doc is
something 
like { foo: 123 } then your map would be [Text("foo"):new 
LongWritable(123)].
The docs provides more information about the Writable types 
supported (basically all of them) and their equivalent
ES types. 

[1] 
Elasticsearch Platform — Find real-time answers at scale | Elastic
<
Elasticsearch Platform — Find real-time answers at scale | Elastic>
On 24/10/2013 7:53 PM, Han JU wrote: 
> Hi, 
> 
> I'm trying to write hadoop aggregation results to ES. 
> Say I've K, V for key and value classes respectively. According to 
elasticsearch-hadoop api/blog, the key is ignored and
> the value should be a Map<K, V>. 
> I'm a little bit confused here: do I need an extra map job to 
convert my (K, V) to (Null, Map<K, V>) ?
> Is there any complete examples of using hadoop and ES together? 
> 
> Thanks. 
> 
> -- 
> You received this message because you are subscribed to the Google 
Groups "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, 
send an email to
>elasticsearc...@googlegroups.com <javascript:>. 
> For more options, visithttps://groups.google.com/groups/opt_out <
https://groups.google.com/groups/opt_out>.
-- 
Costin 
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to
elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.
--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ed879b28-a983-4fb7-bf40-bf6fc0eff68d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

aarthi1890 · September 10, 2014, 6:02am

Hi

Just wanted to know if the code for changing auto generated id has been
committed or is it yet to be changed? I am using
elasticsearch-spark_2.10.Beta 1 version.

Thanks
Aarthi

Thanks
On Friday, 25 October 2013 19:40:05 UTC+5:30, Costin Leau wrote:

On 25/10/2013 4:26 PM, Han JU wrote:

Thanks. Seems like I misunderstand something.

Now I managed to push documents to ES, and I'd like to know if these are
supported by current version of
elasticsearch-binding:

I assume you mean elasticsearch-hadoop.

specifying id for index. Now the "_id" for the documents pushed are
auto generated

the update api

This is being currently worked on and we should have something in trunk by
next week.
Thanks.

在 2013年10月24日星期四UTC+2下午7时05分31秒，Costin Leau写道：
Hi, 

I replied on IRC but you left. See the docs here [1]. The value 
represents your document and since it might contain
multiple fields, ESOuputFormat expects a Map (MapWritable) which 
contains the actual document. Say your doc is
something 
like { foo: 123 } then your map would be [Text("foo"):new 
LongWritable(123)].
The docs provides more information about the Writable types 
supported (basically all of them) and their equivalent
ES types. 

[1] 
Elasticsearch Platform — Find real-time answers at scale | Elastic
<
Elasticsearch Platform — Find real-time answers at scale | Elastic>
On 24/10/2013 7:53 PM, Han JU wrote: 
> Hi, 
> 
> I'm trying to write hadoop aggregation results to ES. 
> Say I've K, V for key and value classes respectively. According to 
elasticsearch-hadoop api/blog, the key is ignored and
> the value should be a Map<K, V>. 
> I'm a little bit confused here: do I need an extra map job to 
convert my (K, V) to (Null, Map<K, V>) ?
> Is there any complete examples of using hadoop and ES together? 
> 
> Thanks. 
> 
> -- 
> You received this message because you are subscribed to the Google 
Groups "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, 
send an email to
>elasticsearc...@googlegroups.com <javascript:>. 
> For more options, visithttps://groups.google.com/groups/opt_out <
https://groups.google.com/groups/opt_out>.
-- 
Costin 
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to
elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.
--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3a82d348-066c-421e-996e-69b81455f175%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

costin · September 10, 2014, 8:53am

One can specify the id for each document for quite some time now, through es.mapping.id parameter [1] - simply point
it to the field containing the ID and you're good to go.

On 9/10/14 9:02 AM, aarthi ranganathan wrote:

Hi
Just wanted to know if the code for changing auto generated id has been committed or is it yet to be changed? I am using
elasticsearch-spark_2.10.Beta 1 version.
Thanks
Aarthi
Thanks
On Friday, 25 October 2013 19:40:05 UTC+5:30, Costin Leau wrote:

On 25/10/2013 4:26 PM, Han JU wrote:
> Thanks. Seems like I misunderstand something.
>
> Now I managed to push documents to ES, and I'd like to know if these are supported by current version of
> elasticsearch-binding:
>

I assume you mean elasticsearch-hadoop.


> - specifying id for index. Now the "_id" for the documents pushed are auto generated
> - the update api
>

This is being currently worked on and we should have something in trunk by next week.

> Thanks.
>
> 在 2013年10月24日星期四UTC+2下午7时05分31秒，Costin Leau写道：
>
>     Hi,
>
>     I replied on IRC but you left. See the docs here [1]. The value represents your document and since it might contain
>     multiple fields, ESOuputFormat expects a Map (MapWritable) which contains the actual document. Say your doc is
>     something
>     like { foo: 123 } then your map would be [Text("foo"):new LongWritable(123)].
>
>     The docs provides more information about the Writable types supported (basically all of them) and their equivalent
>     ES types.
>
>     [1]http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/mapreduce.html
<http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/mapreduce.html>
>     <http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/mapreduce.html
<http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/mapreduce.html>>
>
>     On 24/10/2013 7:53 PM, Han JU wrote:
>     > Hi,
>     >
>     > I'm trying to write hadoop aggregation results to ES.
>     > Say I've K, V for key and value classes respectively. According to elasticsearch-hadoop api/blog, the key is ignored and
>     > the value should be a Map<K, V>.
>     > I'm a little bit confused here: do I need an extra map job to convert my (K, V) to (Null, Map<K, V>) ?
>     > Is there any complete examples of using hadoop and ES together?
>     >
>     > Thanks.
>     >
>     > --
>     > You received this message because you are subscribed to the Google Groups "elasticsearch" group.
>     > To unsubscribe from this group and stop receiving emails from it, send an email to
>     >elasticsearc...@googlegroups.com <javascript:>.
>     > For more options, visithttps://groups.google.com/groups/opt_out <http://groups.google.com/groups/opt_out> <https://groups.google.com/groups/opt_out
<https://groups.google.com/groups/opt_out>>.
>
>     --
>     Costin
>
> --
> You received this message because you are subscribed to the Google Groups "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
>elasticsearc...@googlegroups.com <javascript:>.
> For more options, visithttps://groups.google.com/groups/opt_out <https://groups.google.com/groups/opt_out>.

--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
elasticsearch+unsubscribe@googlegroups.com mailto:elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/3a82d348-066c-421e-996e-69b81455f175%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/3a82d348-066c-421e-996e-69b81455f175%40googlegroups.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout.

--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5410118C.1060907%40gmail.com.
For more options, visit https://groups.google.com/d/optout.

aarthi1890 · September 10, 2014, 10:29am

Thanks for the reply Costin.

On Wednesday, 10 September 2014 14:23:57 UTC+5:30, Costin Leau wrote:

One can specify the id for each document for quite some time now, through es.mapping.id parameter [1] - simply point
it to the field containing the ID and you're good to go.

Elasticsearch Platform — Find real-time answers at scale | Elastic

On 9/10/14 9:02 AM, aarthi ranganathan wrote:
Hi
Just wanted to know if the code for changing auto generated id has been
committed or is it yet to be changed? I am using
elasticsearch-spark_2.10.Beta 1 version.
Thanks
Aarthi
Thanks
On Friday, 25 October 2013 19:40:05 UTC+5:30, Costin Leau wrote:
On 25/10/2013 4:26 PM, Han JU wrote: 
> Thanks. Seems like I misunderstand something. 
> 
> Now I managed to push documents to ES, and I'd like to know if 
these are supported by current version of
> elasticsearch-binding: 
> 

I assume you mean elasticsearch-hadoop. 


> - specifying id for index. Now the "_id" for the documents pushed 
are auto generated
> - the update api 
> 

This is being currently worked on and we should have something in 
trunk by next week.
> Thanks. 
> 
> 在 2013年10月24日星期四UTC+2下午7时05分31秒，Costin Leau写道： 
> 
>     Hi, 
> 
>     I replied on IRC but you left. See the docs here [1]. The 
value represents your document and since it might contain
>     multiple fields, ESOuputFormat expects a Map (MapWritable) 
which contains the actual document. Say your doc is
>     something 
>     like { foo: 123 } then your map would be [Text("foo"):new 
LongWritable(123)].
> 
>     The docs provides more information about the Writable types 
supported (basically all of them) and their equivalent
>     ES types. 
> 
>     [1]
Elasticsearch Platform — Find real-time answers at scale | Elastic
<
Elasticsearch Platform — Find real-time answers at scale | Elastic>
>     <
Elasticsearch Platform — Find real-time answers at scale | Elastic
<
Elasticsearch Platform — Find real-time answers at scale | Elastic>>
> 
>     On 24/10/2013 7:53 PM, Han JU wrote: 
>     > Hi, 
>     > 
>     > I'm trying to write hadoop aggregation results to ES. 
>     > Say I've K, V for key and value classes respectively. 
According to elasticsearch-hadoop api/blog, the key is ignored and
>     > the value should be a Map<K, V>. 
>     > I'm a little bit confused here: do I need an extra map job 
to convert my (K, V) to (Null, Map<K, V>) ?
>     > Is there any complete examples of using hadoop and ES 
together?
>     > 
>     > Thanks. 
>     > 
>     > -- 
>     > You received this message because you are subscribed to the 
Google Groups "elasticsearch" group.
>     > To unsubscribe from this group and stop receiving emails 
from it, send an email to
>     >elasticsearc...@googlegroups.com <javascript:>. 
>     > For more options, visithttps://
groups.google.com/groups/opt_out http://groups.google.com/groups/opt_out
<https://groups.google.com/groups/opt_out
<https://groups.google.com/groups/opt_out>>. 
> 
>     -- 
>     Costin 
> 
> -- 
> You received this message because you are subscribed to the Google 
Groups "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, 
send an email to
>elasticsearc...@googlegroups.com <javascript:>. 
> For more options, visithttps://groups.google.com/groups/opt_out <
https://groups.google.com/groups/opt_out>.
-- 
Costin 
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to
elasticsearc...@googlegroups.com <javascript:> <mailto:
elasticsearch+unsubscribe@googlegroups.com <javascript:>>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/3a82d348-066c-421e-996e-69b81455f175%40googlegroups.com

<
https://groups.google.com/d/msgid/elasticsearch/3a82d348-066c-421e-996e-69b81455f175%40googlegroups.com?utm_medium=email&utm_source=footer>.

For more options, visit https://groups.google.com/d/optout.

--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/35f03b6d-de8a-4bd4-91e8-2fda7a940025%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
ElasticSearch+Hadoop+Spark Elasticsearch	2	963	July 6, 2017
Need help for Hadoop and ES integration Elasticsearch	6	445	July 6, 2017
ElasticSearch Data Transformation Elasticsearch es-hadoop	2	774	July 30, 2018
ES Aggregations in Spark Elasticsearch es-hadoop	2	2170	July 6, 2017
How to generate ES index in the hadoop Elasticsearch	7	1229	July 6, 2017

[hadoop] Pipelining Hadoop/Spark with ElasticSearch

Related topics