[hadoop] Pipelining Hadoop/Spark with ElasticSearch

Hi,

I'm trying to write hadoop aggregation results to ES.
Say I've K, V for key and value classes respectively. According to
elasticsearch-hadoop api/blog, the key is ignored and the value should be a
Map<K, V>.
I'm a little bit confused here: do I need an extra map job to convert my
(K, V) to (Null, Map<K, V>) ?
Is there any complete examples of using hadoop and ES together?

Thanks.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Have you tried the elasticsearch-hadoop plugin? There is good
documentation on the website.

Thanks,
Matt Weber

On Thu, Oct 24, 2013 at 9:53 AM, Han JU ju.han.felix@gmail.com wrote:

Hi,

I'm trying to write hadoop aggregation results to ES.
Say I've K, V for key and value classes respectively. According to
elasticsearch-hadoop api/blog, the key is ignored and the value should be a
Map<K, V>.
I'm a little bit confused here: do I need an extra map job to convert my
(K, V) to (Null, Map<K, V>) ?
Is there any complete examples of using hadoop and ES together?

Thanks.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi,

I replied on IRC but you left. See the docs here [1]. The value represents your document and since it might contain
multiple fields, ESOuputFormat expects a Map (MapWritable) which contains the actual document. Say your doc is something
like { foo: 123 } then your map would be [Text("foo"):new LongWritable(123)].

The docs provides more information about the Writable types supported (basically all of them) and their equivalent ES types.

[1] Elasticsearch Platform — Find real-time answers at scale | Elastic

On 24/10/2013 7:53 PM, Han JU wrote:

Hi,

I'm trying to write hadoop aggregation results to ES.
Say I've K, V for key and value classes respectively. According to elasticsearch-hadoop api/blog, the key is ignored and
the value should be a Map<K, V>.
I'm a little bit confused here: do I need an extra map job to convert my (K, V) to (Null, Map<K, V>) ?
Is there any complete examples of using hadoop and ES together?

Thanks.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thanks. Seems like I misunderstand something.

Now I managed to push documents to ES, and I'd like to know if these are
supported by current version of elasticsearch-binding:

  • specifying id for index. Now the "_id" for the documents pushed are auto
    generated
  • the update api

Thanks.

在 2013年10月24日星期四UTC+2下午7时05分31秒,Costin Leau写道:

Hi,

I replied on IRC but you left. See the docs here [1]. The value represents
your document and since it might contain
multiple fields, ESOuputFormat expects a Map (MapWritable) which contains
the actual document. Say your doc is something
like { foo: 123 } then your map would be [Text("foo"):new
LongWritable(123)].

The docs provides more information about the Writable types supported
(basically all of them) and their equivalent ES types.

[1]
Elasticsearch Platform — Find real-time answers at scale | Elastic

On 24/10/2013 7:53 PM, Han JU wrote:

Hi,

I'm trying to write hadoop aggregation results to ES.
Say I've K, V for key and value classes respectively. According to
elasticsearch-hadoop api/blog, the key is ignored and
the value should be a Map<K, V>.
I'm a little bit confused here: do I need an extra map job to convert my
(K, V) to (Null, Map<K, V>) ?
Is there any complete examples of using hadoop and ES together?

Thanks.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to
elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

On 25/10/2013 4:26 PM, Han JU wrote:

Thanks. Seems like I misunderstand something.

Now I managed to push documents to ES, and I'd like to know if these are supported by current version of
elasticsearch-binding:

I assume you mean elasticsearch-hadoop.

  • specifying id for index. Now the "_id" for the documents pushed are auto generated
  • the update api

This is being currently worked on and we should have something in trunk by next week.

Thanks.

在 2013年10月24日星期四UTC+2下午7时05分31秒,Costin Leau写道:

Hi,

I replied on IRC but you left. See the docs here [1]. The value represents your document and since it might contain
multiple fields, ESOuputFormat expects a Map (MapWritable) which contains the actual document. Say your doc is
something
like { foo: 123 } then your map would be [Text("foo"):new LongWritable(123)].

The docs provides more information about the Writable types supported (basically all of them) and their equivalent
ES types.

[1] http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/mapreduce.html
<http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/mapreduce.html>

On 24/10/2013 7:53 PM, Han JU wrote:
> Hi,
>
> I'm trying to write hadoop aggregation results to ES.
> Say I've K, V for key and value classes respectively. According to elasticsearch-hadoop api/blog, the key is ignored and
> the value should be a Map<K, V>.
> I'm a little bit confused here: do I need an extra map job to convert my (K, V) to (Null, Map<K, V>) ?
> Is there any complete examples of using hadoop and ES together?
>
> Thanks.
>
> --
> You received this message because you are subscribed to the Google Groups "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
>elasticsearc...@googlegroups.com <javascript:>.
> For more options, visithttps://groups.google.com/groups/opt_out <https://groups.google.com/groups/opt_out>.

--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi

Wanted to know if the auto generated id has been committed.

Thanks
Aarthi

On Friday, 25 October 2013 19:40:05 UTC+5:30, Costin Leau wrote:

On 25/10/2013 4:26 PM, Han JU wrote:

Thanks. Seems like I misunderstand something.

Now I managed to push documents to ES, and I'd like to know if these are
supported by current version of
elasticsearch-binding:

I assume you mean elasticsearch-hadoop.

  • specifying id for index. Now the "_id" for the documents pushed are
    auto generated
  • the update api

This is being currently worked on and we should have something in trunk by
next week.

Thanks.

在 2013年10月24日星期四UTC+2下午7时05分31秒,Costin Leau写道:

Hi, 

I replied on IRC but you left. See the docs here [1]. The value 

represents your document and since it might contain

multiple fields, ESOuputFormat expects a Map (MapWritable) which 

contains the actual document. Say your doc is

something 
like { foo: 123 } then your map would be [Text("foo"):new 

LongWritable(123)].

The docs provides more information about the Writable types 

supported (basically all of them) and their equivalent

ES types. 

[1] 

Elasticsearch Platform — Find real-time answers at scale | Elastic

<

Elasticsearch Platform — Find real-time answers at scale | Elastic>

On 24/10/2013 7:53 PM, Han JU wrote: 
> Hi, 
> 
> I'm trying to write hadoop aggregation results to ES. 
> Say I've K, V for key and value classes respectively. According to 

elasticsearch-hadoop api/blog, the key is ignored and

> the value should be a Map<K, V>. 
> I'm a little bit confused here: do I need an extra map job to 

convert my (K, V) to (Null, Map<K, V>) ?

> Is there any complete examples of using hadoop and ES together? 
> 
> Thanks. 
> 
> -- 
> You received this message because you are subscribed to the Google 

Groups "elasticsearch" group.

> To unsubscribe from this group and stop receiving emails from it, 

send an email to

>elasticsearc...@googlegroups.com <javascript:>. 
> For more options, visithttps://groups.google.com/groups/opt_out <

https://groups.google.com/groups/opt_out>.

-- 
Costin 

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to
elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ed879b28-a983-4fb7-bf40-bf6fc0eff68d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hi

Just wanted to know if the code for changing auto generated id has been
committed or is it yet to be changed? I am using
elasticsearch-spark_2.10.Beta 1 version.

Thanks
Aarthi

Thanks
On Friday, 25 October 2013 19:40:05 UTC+5:30, Costin Leau wrote:

On 25/10/2013 4:26 PM, Han JU wrote:

Thanks. Seems like I misunderstand something.

Now I managed to push documents to ES, and I'd like to know if these are
supported by current version of
elasticsearch-binding:

I assume you mean elasticsearch-hadoop.

  • specifying id for index. Now the "_id" for the documents pushed are
    auto generated
  • the update api

This is being currently worked on and we should have something in trunk by
next week.

Thanks.

在 2013年10月24日星期四UTC+2下午7时05分31秒,Costin Leau写道:

Hi, 

I replied on IRC but you left. See the docs here [1]. The value 

represents your document and since it might contain

multiple fields, ESOuputFormat expects a Map (MapWritable) which 

contains the actual document. Say your doc is

something 
like { foo: 123 } then your map would be [Text("foo"):new 

LongWritable(123)].

The docs provides more information about the Writable types 

supported (basically all of them) and their equivalent

ES types. 

[1] 

Elasticsearch Platform — Find real-time answers at scale | Elastic

<

Elasticsearch Platform — Find real-time answers at scale | Elastic>

On 24/10/2013 7:53 PM, Han JU wrote: 
> Hi, 
> 
> I'm trying to write hadoop aggregation results to ES. 
> Say I've K, V for key and value classes respectively. According to 

elasticsearch-hadoop api/blog, the key is ignored and

> the value should be a Map<K, V>. 
> I'm a little bit confused here: do I need an extra map job to 

convert my (K, V) to (Null, Map<K, V>) ?

> Is there any complete examples of using hadoop and ES together? 
> 
> Thanks. 
> 
> -- 
> You received this message because you are subscribed to the Google 

Groups "elasticsearch" group.

> To unsubscribe from this group and stop receiving emails from it, 

send an email to

>elasticsearc...@googlegroups.com <javascript:>. 
> For more options, visithttps://groups.google.com/groups/opt_out <

https://groups.google.com/groups/opt_out>.

-- 
Costin 

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to
elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3a82d348-066c-421e-996e-69b81455f175%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

One can specify the id for each document for quite some time now, through es.mapping.id parameter [1] - simply point
it to the field containing the ID and you're good to go.

On 9/10/14 9:02 AM, aarthi ranganathan wrote:

Hi
Just wanted to know if the code for changing auto generated id has been committed or is it yet to be changed? I am using
elasticsearch-spark_2.10.Beta 1 version.
Thanks
Aarthi
Thanks
On Friday, 25 October 2013 19:40:05 UTC+5:30, Costin Leau wrote:

On 25/10/2013 4:26 PM, Han JU wrote:
> Thanks. Seems like I misunderstand something.
>
> Now I managed to push documents to ES, and I'd like to know if these are supported by current version of
> elasticsearch-binding:
>

I assume you mean elasticsearch-hadoop.


> - specifying id for index. Now the "_id" for the documents pushed are auto generated
> - the update api
>

This is being currently worked on and we should have something in trunk by next week.

> Thanks.
>
> 在 2013年10月24日星期四UTC+2下午7时05分31秒,Costin Leau写道:
>
>     Hi,
>
>     I replied on IRC but you left. See the docs here [1]. The value represents your document and since it might contain
>     multiple fields, ESOuputFormat expects a Map (MapWritable) which contains the actual document. Say your doc is
>     something
>     like { foo: 123 } then your map would be [Text("foo"):new LongWritable(123)].
>
>     The docs provides more information about the Writable types supported (basically all of them) and their equivalent
>     ES types.
>
>     [1]http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/mapreduce.html
<http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/mapreduce.html>
>     <http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/mapreduce.html
<http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/mapreduce.html>>
>
>     On 24/10/2013 7:53 PM, Han JU wrote:
>     > Hi,
>     >
>     > I'm trying to write hadoop aggregation results to ES.
>     > Say I've K, V for key and value classes respectively. According to elasticsearch-hadoop api/blog, the key is ignored and
>     > the value should be a Map<K, V>.
>     > I'm a little bit confused here: do I need an extra map job to convert my (K, V) to (Null, Map<K, V>) ?
>     > Is there any complete examples of using hadoop and ES together?
>     >
>     > Thanks.
>     >
>     > --
>     > You received this message because you are subscribed to the Google Groups "elasticsearch" group.
>     > To unsubscribe from this group and stop receiving emails from it, send an email to
>     >elasticsearc...@googlegroups.com <javascript:>.
>     > For more options, visithttps://groups.google.com/groups/opt_out <http://groups.google.com/groups/opt_out> <https://groups.google.com/groups/opt_out
<https://groups.google.com/groups/opt_out>>.
>
>     --
>     Costin
>
> --
> You received this message because you are subscribed to the Google Groups "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
>elasticsearc...@googlegroups.com <javascript:>.
> For more options, visithttps://groups.google.com/groups/opt_out <https://groups.google.com/groups/opt_out>.

--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
elasticsearch+unsubscribe@googlegroups.com mailto:elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/3a82d348-066c-421e-996e-69b81455f175%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/3a82d348-066c-421e-996e-69b81455f175%40googlegroups.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout.

--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5410118C.1060907%40gmail.com.
For more options, visit https://groups.google.com/d/optout.

Thanks for the reply Costin.

On Wednesday, 10 September 2014 14:23:57 UTC+5:30, Costin Leau wrote:

One can specify the id for each document for quite some time now, through es.mapping.id parameter [1] - simply point
it to the field containing the ID and you're good to go.

Elasticsearch Platform — Find real-time answers at scale | Elastic

On 9/10/14 9:02 AM, aarthi ranganathan wrote:

Hi
Just wanted to know if the code for changing auto generated id has been
committed or is it yet to be changed? I am using
elasticsearch-spark_2.10.Beta 1 version.
Thanks
Aarthi
Thanks
On Friday, 25 October 2013 19:40:05 UTC+5:30, Costin Leau wrote:

On 25/10/2013 4:26 PM, Han JU wrote: 
> Thanks. Seems like I misunderstand something. 
> 
> Now I managed to push documents to ES, and I'd like to know if 

these are supported by current version of

> elasticsearch-binding: 
> 

I assume you mean elasticsearch-hadoop. 


> - specifying id for index. Now the "_id" for the documents pushed 

are auto generated

> - the update api 
> 

This is being currently worked on and we should have something in 

trunk by next week.

> Thanks. 
> 
> 在 2013年10月24日星期四UTC+2下午7时05分31秒,Costin Leau写道: 
> 
>     Hi, 
> 
>     I replied on IRC but you left. See the docs here [1]. The 

value represents your document and since it might contain

>     multiple fields, ESOuputFormat expects a Map (MapWritable) 

which contains the actual document. Say your doc is

>     something 
>     like { foo: 123 } then your map would be [Text("foo"):new 

LongWritable(123)].

> 
>     The docs provides more information about the Writable types 

supported (basically all of them) and their equivalent

>     ES types. 
> 
>     [1]

Elasticsearch Platform — Find real-time answers at scale | Elastic

<

Elasticsearch Platform — Find real-time answers at scale | Elastic>

>     <

Elasticsearch Platform — Find real-time answers at scale | Elastic

<

Elasticsearch Platform — Find real-time answers at scale | Elastic>>

> 
>     On 24/10/2013 7:53 PM, Han JU wrote: 
>     > Hi, 
>     > 
>     > I'm trying to write hadoop aggregation results to ES. 
>     > Say I've K, V for key and value classes respectively. 

According to elasticsearch-hadoop api/blog, the key is ignored and

>     > the value should be a Map<K, V>. 
>     > I'm a little bit confused here: do I need an extra map job 

to convert my (K, V) to (Null, Map<K, V>) ?

>     > Is there any complete examples of using hadoop and ES 

together?

>     > 
>     > Thanks. 
>     > 
>     > -- 
>     > You received this message because you are subscribed to the 

Google Groups "elasticsearch" group.

>     > To unsubscribe from this group and stop receiving emails 

from it, send an email to

>     >elasticsearc...@googlegroups.com <javascript:>. 
>     > For more options, visithttps://

groups.google.com/groups/opt_out http://groups.google.com/groups/opt_out
<https://groups.google.com/groups/opt_out

<https://groups.google.com/groups/opt_out>>. 
> 
>     -- 
>     Costin 
> 
> -- 
> You received this message because you are subscribed to the Google 

Groups "elasticsearch" group.

> To unsubscribe from this group and stop receiving emails from it, 

send an email to

>elasticsearc...@googlegroups.com <javascript:>. 
> For more options, visithttps://groups.google.com/groups/opt_out <

https://groups.google.com/groups/opt_out>.

-- 
Costin 

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to
elasticsearc...@googlegroups.com <javascript:> <mailto:
elasticsearch+unsubscribe@googlegroups.com <javascript:>>.
To view this discussion on the web visit

https://groups.google.com/d/msgid/elasticsearch/3a82d348-066c-421e-996e-69b81455f175%40googlegroups.com

<
https://groups.google.com/d/msgid/elasticsearch/3a82d348-066c-421e-996e-69b81455f175%40googlegroups.com?utm_medium=email&utm_source=footer>.

For more options, visit https://groups.google.com/d/optout.

--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/35f03b6d-de8a-4bd4-91e8-2fda7a940025%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.