[Hadoop][pig] How to set the document id?


(Niels Basjes) #1

I've been doing some PoC applications using the ES STORE functionality that allows me to load data from pub directly into ES.
In some cases after running an job I want to redo the same data but with some small changes to the job. In those scenarios I want to simply update the existing documents with a newer version (having a few changed fields).

I think there way to do that is by manually setting the document id. So far I've not been able to figure out how to set the document id for my pig script.

A simple working example would really help me out.

Thanks.

Niels Basjes

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3766096f-d5b9-4b79-8adf-cb1f44dfe2b1%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Costin Leau) #2

This functionality is available in master. Define 'es.mapping.id' property
which can point to either a field name or an absolute value (specified in <

).
From the tests
B = FOREACH A GENERATE id, name, TOBAG(url, picture) AS links
STORE B INTO 'pig/update' USING org.elasticsearch.hadoop.pig.ESStorage('
es.mapping.id=id')

Note that you can specify also the write operation - whether it's index
(default), create or update (doc upsert). For your case, the default should
work just fine.

Cheers,

On Sun, Dec 1, 2013 at 5:21 PM, Niels Basjes niels@basj.es wrote:

I've been doing some PoC applications using the ES STORE functionality
that allows me to load data from pub directly into ES.
In some cases after running an job I want to redo the same data but with
some small changes to the job. In those scenarios I want to simply update
the existing documents with a newer version (having a few changed fields).

I think there way to do that is by manually setting the document id. So
far I've not been able to figure out how to set the document id for my pig
script.

A simple working example would really help me out.

Thanks.

Niels Basjes

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/3766096f-d5b9-4b79-8adf-cb1f44dfe2b1%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJogdmdpgpYwo7pNcLXrCebotatSW3KE3L1G-Yr_voadiHq8rg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Niels Basjes) #3

Thanks I'll try to build from master a version of the plugin later this week.

Niels

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/23def31c-d00d-40b6-883b-66ac78db13f6%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Costin Leau) #4

The snapshots are published daily to a maven repo so you can just point your maven tool to it and are ready to go [1]

[1] http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/install.html

On 02/12/2013 9:13 PM, Niels Basjes wrote:

Thanks I'll try to build from master a version of the plugin later this week.

Niels

--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/529CE2A4.1080005%40gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Niels Basjes) #5

Hi,

Op maandag 2 december 2013 20:42:28 UTC+1 schreef Costin Leau:

The snapshots are published daily to a maven repo so you can just point
your maven tool to it and are ready to go [1]

[1]
http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/install.html

Thanks.
From pig I simply need the jar with all the stuff, so Maven doesn't do that
for me (not easily).
I did some browsing and found this which is just as good for me:
http://oss.sonatype.org/content/repositories/snapshots/org/elasticsearch/elasticsearch-hadoop/1.3.0.BUILD-SNAPSHOT/

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5ef594fd-f0a0-4b85-ab17-578ea26b6999%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Costin Leau) #6

Not sure what you mean by "the jar with all the stuff" - I was suggesting using a maven-compatible tool to download the
jar easily rather then manually.
You can then manipulate the jar any way you want.

Cheers,

On 02/12/2013 9:53 PM, Niels Basjes wrote:

Hi,

Op maandag 2 december 2013 20:42:28 UTC+1 schreef Costin Leau:

The snapshots are published daily to a maven repo so you can just point your maven tool to it and are ready to go [1]

[1] http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/install.html
<http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/install.html>

Thanks.
From pig I simply need the jar with all the stuff, so Maven doesn't do that for me (not easily).
I did some browsing and found this which is just as good for me:
http://oss.sonatype.org/content/repositories/snapshots/org/elasticsearch/elasticsearch-hadoop/1.3.0.BUILD-SNAPSHOT/

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5ef594fd-f0a0-4b85-ab17-578ea26b6999%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/529CE8C9.105%40gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #7