I've been doing some PoC applications using the ES STORE functionality that allows me to load data from pub directly into ES.
In some cases after running an job I want to redo the same data but with some small changes to the job. In those scenarios I want to simply update the existing documents with a newer version (having a few changed fields).
I think there way to do that is by manually setting the document id. So far I've not been able to figure out how to set the document id for my pig script.
A simple working example would really help me out.
This functionality is available in master. Define 'es.mapping.id' property
which can point to either a field name or an absolute value (specified in <
).
From the tests
B = FOREACH A GENERATE id, name, TOBAG(url, picture) AS links
STORE B INTO 'pig/update' USING org.elasticsearch.hadoop.pig.ESStorage('
es.mapping.id=id')
Note that you can specify also the write operation - whether it's index
(default), create or update (doc upsert). For your case, the default should
work just fine.
Cheers,
On Sun, Dec 1, 2013 at 5:21 PM, Niels Basjes niels@basj.es wrote:
I've been doing some PoC applications using the ES STORE functionality
that allows me to load data from pub directly into ES.
In some cases after running an job I want to redo the same data but with
some small changes to the job. In those scenarios I want to simply update
the existing documents with a newer version (having a few changed fields).
I think there way to do that is by manually setting the document id. So
far I've not been able to figure out how to set the document id for my pig
script.
A simple working example would really help me out.
Not sure what you mean by "the jar with all the stuff" - I was suggesting using a maven-compatible tool to download the
jar easily rather then manually.
You can then manipulate the jar any way you want.
Cheers,
On 02/12/2013 9:53 PM, Niels Basjes wrote:
Hi,
Op maandag 2 december 2013 20:42:28 UTC+1 schreef Costin Leau:
The snapshots are published daily to a maven repo so you can just point your maven tool to it and are ready to go [1]
[1] http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/install.html
<http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/install.html>
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.