PIG: Reindexing documents with _source disabled


(Tyler C.) #1

My question is about using Pig to update ES. If my index has _source
disabled (I only wish to count the number of documents matching a query),
how does one reindex a document?
Here's how I plan on using pig and elastic search:

Use ES-pig to initially create an index of 10,000,000 documents. No _source.
1 Week later, index or reindex 10,000,000 documents, where maybe 95% of the
docs are overlap with the original set.

Is this possible with _source disabled? In normal operations update/upsert
operations are not possible without _source, presumably this extends to the
elasticsearch-hadoop bridges.

T.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/89ed687e-fc5a-4dd5-b81d-310b3c1fe363%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Binh Ly) #2

Tyler, no experience with Pig here. However, from the ES perspective, you
will need the full JSON document to be able to index anything into ES. This
JSON document can come from anywhere external to ES, or from the _source
field (if enabled) of an already existing ES index. And even if _source was
available in an ES index, you'd still need to read it out and then
push/index it back in (whether to the same or to a new index) if you want
to do a "reindex".

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/34306bfc-a451-4543-a45d-617857f3d903%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #3