Good point - I am using logstash to re-ingest the same csv files.
I have fingerprinted the records, so next time I run logstash it should greate the same document ID for the same record.
Are you saying that if the record data is identical then no new version will be created? that could be the explanation and I will test it later today and update the thread.
If you use the Index API, then the version will increase as the Index API does not retrieve the document to perform a diff. However, if you use the Update API and the document hasn't changed, then the version won't be bumped (i.e. noop operation)
This is because the ID is different on each indexation and thus the existing document is not updated but a new one is created instead. What configuration do you have for the document_id setting in the elasticsearch output?
run_date is also a constant (example: 2017-10-25), you can call it column_b. Once run_date is in the csv file it's never changing changing.
When I combine column_a+column_b (run_date) using fingerprint, and use the hash as document_id, then the next time I index the same record, I expect the same hash ==> the same document ID because nothing has changed (maybe the timestamp).
However column_a+column_b are not changing
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.