Extra documents in Elastic Search


(Napoleon T.) #1

Hi,

I'm trying to store a lot of documents into ES using pig. The pig job ends
successfully but I end up with more documents in Elasticsearch than the
number of rows in my input.
My pig script is 3 lines:
REGISTER 'local/path/to/m2.jar'
data = load 'path/to/hdfs/file.tsv' as (field1: chararray, field2: long,
field3: long, field4: long)
store data into 'index/type' using
org.elasticsearch.hadoop.pig.EsStorage('es.nodes=node2.domain.com',
'es.rersource=index/type');

I have speculative execution disabled for map and reduce when running this
pig script.

Hadoop states that 54,723,557 records were written (console output and job
tracker UI).
ES head plugin claims that I have docs: 57,344,987 (57,344,987).

My environment:
hadoop: 1.2.1 with 6 nodes cluster
elasticsearch: 1.0.0. 6 node cluster. Different than hadoop nodes.
elasticsearch-hadoop version M2.
Pig version: 0.12.0

Any ideas of what is going on here?

Thanks.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/eec8a0da-be72-46e0-8358-edca94f077f3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #2