[Hadoop][pig] How to set the document id?

Niels_Basjes · December 1, 2013, 3:21pm

I've been doing some PoC applications using the ES STORE functionality that allows me to load data from pub directly into ES.
In some cases after running an job I want to redo the same data but with some small changes to the job. In those scenarios I want to simply update the existing documents with a newer version (having a few changed fields).

I think there way to do that is by manually setting the document id. So far I've not been able to figure out how to set the document id for my pig script.

A simple working example would really help me out.

Thanks.

Niels Basjes

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3766096f-d5b9-4b79-8adf-cb1f44dfe2b1%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

costin · December 1, 2013, 9:01pm

This functionality is available in master. Define 'es.mapping.id' property
which can point to either a field name or an absolute value (specified in <

).
From the tests
B = FOREACH A GENERATE id, name, TOBAG(url, picture) AS links
STORE B INTO 'pig/update' USING org.elasticsearch.hadoop.pig.ESStorage('
es.mapping.id=id')

Note that you can specify also the write operation - whether it's index
(default), create or update (doc upsert). For your case, the default should
work just fine.

Cheers,

On Sun, Dec 1, 2013 at 5:21 PM, Niels Basjes niels@basj.es wrote:

I've been doing some PoC applications using the ES STORE functionality
that allows me to load data from pub directly into ES.
In some cases after running an job I want to redo the same data but with
some small changes to the job. In those scenarios I want to simply update
the existing documents with a newer version (having a few changed fields).

I think there way to do that is by manually setting the document id. So
far I've not been able to figure out how to set the document id for my pig
script.

A simple working example would really help me out.

Thanks.

Niels Basjes

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/3766096f-d5b9-4b79-8adf-cb1f44dfe2b1%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJogdmdpgpYwo7pNcLXrCebotatSW3KE3L1G-Yr_voadiHq8rg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Niels_Basjes · December 2, 2013, 7:13pm

Thanks I'll try to build from master a version of the plugin later this week.

Niels

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/23def31c-d00d-40b6-883b-66ac78db13f6%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

costin · December 2, 2013, 7:42pm

The snapshots are published daily to a maven repo so you can just point your maven tool to it and are ready to go [1]

[1] Elasticsearch Platform — Find real-time answers at scale | Elastic

On 02/12/2013 9:13 PM, Niels Basjes wrote:

Thanks I'll try to build from master a version of the plugin later this week.

Niels

--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/529CE2A4.1080005%40gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Niels_Basjes · December 2, 2013, 7:53pm

Hi,

Op maandag 2 december 2013 20:42:28 UTC+1 schreef Costin Leau:

The snapshots are published daily to a maven repo so you can just point
your maven tool to it and are ready to go [1]

[1]
Elasticsearch Platform — Find real-time answers at scale | Elastic

Thanks.
From pig I simply need the jar with all the stuff, so Maven doesn't do that
for me (not easily).
I did some browsing and found this which is just as good for me:
http://oss.sonatype.org/content/repositories/snapshots/org/elasticsearch/elasticsearch-hadoop/1.3.0.BUILD-SNAPSHOT/

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5ef594fd-f0a0-4b85-ab17-578ea26b6999%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

costin · December 2, 2013, 8:08pm

Not sure what you mean by "the jar with all the stuff" - I was suggesting using a maven-compatible tool to download the
jar easily rather then manually.
You can then manipulate the jar any way you want.

Cheers,

On 02/12/2013 9:53 PM, Niels Basjes wrote:

Hi,

Op maandag 2 december 2013 20:42:28 UTC+1 schreef Costin Leau:
The snapshots are published daily to a maven repo so you can just point your maven tool to it and are ready to go [1]

[1] http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/install.html
<http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/install.html>
Thanks.
From pig I simply need the jar with all the stuff, so Maven doesn't do that for me (not easily).
I did some browsing and found this which is just as good for me:
Index of /repositories/snapshots/org/elasticsearch/elasticsearch-hadoop/1.3.0.BUILD-SNAPSHOT

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5ef594fd-f0a0-4b85-ab17-578ea26b6999%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/529CE8C9.105%40gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
[Hadoop][pig] How to set the document id in ESStorage? Elasticsearch	3	388	July 6, 2017
Setting id of document with elasticsearch-hadoop that is not in source document Elasticsearch	6	507	July 6, 2017
[Hadoop] Setting Document ID in Map Reduce Mapper Elasticsearch	5	962	July 6, 2017
[hadoop] Push _id to ES via PIG ESStorage Elasticsearch	3	333	July 6, 2017
[hadoop] Extra Documents in Elastic Search Elasticsearch	3	356	July 6, 2017

[Hadoop][pig] How to set the document id?

Related topics