Document version number not advancing

Hello,
I just tested how versions work, loaded the same set of CSV files again, same loading script and everything is the same.

Looking at the "discover" section of Kibana I expected to see the @version number in my documents advance from "1" to "2" but that did not happen.

What could be the cause?

How do you perform your updates? In bulk? Can you show your script?

Note that if you use the Update API and nothing changes in your document, the version is not increased (i.e. that's a noop operation)

Good point - I am using logstash to re-ingest the same csv files.
I have fingerprinted the records, so next time I run logstash it should greate the same document ID for the same record.

Are you saying that if the record data is identical then no new version will be created? that could be the explanation and I will test it later today and update the thread.

If you use the Index API, then the version will increase as the Index API does not retrieve the document to perform a diff. However, if you use the Update API and the document hasn't changed, then the version won't be bumped (i.e. noop operation)

I use neither. Indexing is done via logstash CSV plugin. What does it do in the background? I think it does index API calls.

So far versioning does not work as expected.

  1. I have many columns in my CSV
  2. I fingerprint 2 of the columns, and index the csv file using logstash
  3. I change one of the columns that does not participate in the fingerprint and re-run logstash

The result is new documents instead of incrementing the version of the existing documents.

This is because the ID is different on each indexation and thus the existing document is not updated but a new one is created instead. What configuration do you have for the document_id setting in the elasticsearch output?

I use the fingerprint result for document ID
document_id => "%{[@metadata][fingerprint]}"

After setting fingerprint to:
fingerprint {
method => "SHA1"
source => [ "column_a","run_date" ]
concatenate_sources => true
target => "[@metadata][fingerprint]"
key => "SOMEKEY"
}

Ok then I guess column_a is constant, but where does run_date come from'

run_date is also a constant (example: 2017-10-25), you can call it column_b. Once run_date is in the csv file it's never changing changing.

When I combine column_a+column_b (run_date) using fingerprint, and use the hash as document_id, then the next time I index the same record, I expect the same hash ==> the same document ID because nothing has changed (maybe the timestamp).
However column_a+column_b are not changing

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.