Extra documents in Elastic Search

Napoleon_T · April 23, 2014, 9:13pm

Hi,

I'm trying to store a lot of documents into ES using pig. The pig job ends
successfully but I end up with more documents in Elasticsearch than the
number of rows in my input.
My pig script is 3 lines:
REGISTER 'local/path/to/m2.jar'
data = load 'path/to/hdfs/file.tsv' as (field1: chararray, field2: long,
field3: long, field4: long)
store data into 'index/type' using
org.elasticsearch.hadoop.pig.EsStorage('es.nodes=node2.domain.com',
'es.rersource=index/type');

I have speculative execution disabled for map and reduce when running this
pig script.

Hadoop states that 54,723,557 records were written (console output and job
tracker UI).
ES head plugin claims that I have docs: 57,344,987 (57,344,987).

My environment:
hadoop: 1.2.1 with 6 nodes cluster
elasticsearch: 1.0.0. 6 node cluster. Different than hadoop nodes.
elasticsearch-hadoop version M2.
Pig version: 0.12.0

Any ideas of what is going on here?

Thanks.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/eec8a0da-be72-46e0-8358-edca94f077f3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
[hadoop] Extra Documents in Elastic Search Elasticsearch	3	371	July 6, 2017
[Hadoop] storing data in ES using pig script Elasticsearch	8	436	July 6, 2017
Pig - Lost documents while storing with EsStorage Elasticsearch es-hadoop	4	1193	July 6, 2017
Wrong number of docs in elasticsearch Elasticsearch	2	372	April 26, 2018
Indexing from hdfs to elasticsearch using Pig Elasticsearch es-hadoop	4	999	July 6, 2017

Extra documents in Elastic Search

Related topics