Can not create update index pipeline without unique identifier in database

PodarcisMuralis · July 3, 2023, 8:10pm

Hi all.

I want to create an update index using jdbc input and elasticsearch output.
I have a left joined table consists of 4 tables in database.
But I do not have a unique identifier in db. Therefore I can not create unique id in document_id.
This means if a record is updated, I can not update my document in index.

I tried to generate uuid in filter section but it created new document id even the database record is updated.

I also tried to combine more than one field to create custom document_id but it was not unique.

Is there any solution for this?
Thanks.

leandrojmp · July 3, 2023, 8:18pm

Are you using Logstash? If so this is expected, it is how the uuid filter works, it is generate a unique ID for every document, it is explained in the documentation.

This is useful if you need to generate a string that’s unique for every event, even if the same input is processed multiple times. If you want to generate strings that are identical each time a event with a given content is processed (i.e. a hash) you should use the fingerprint filter instead.

How you combined that? Using the fingerprint filter?

It depends entirely on your data, you need to have a field or a combination of fields that is unique to be able to have a custom unique _id in elasticsearch.

PodarcisMuralis · July 5, 2023, 9:25am

Thanks for the reply.

Since I generate uuid in filter section and it creates distinct events, if someone changes database record, uuid plugin creates new id for the updated version also. Therefore I can not update it. So uuid does not work.

I do not have a unique identifier in database, it means I can not update my documents in index.

Initially I thought creating my composite on database query by combining more than one columns. But it is also not unique and one of them changes, we do not have unique identifier anymore.

SELECT COL1|| '_' || COL2 || '_' || COL13 AS **COMPOSITE_KEY**,
       COL4,
      COL5, etc.

output {
  elasticsearch {
    id => "a_name"
    action => "update"
    index => "my_idx"
    document_id => "%{**COMPOSITE_KEY**}"
    doc_as_upsert => true
    hosts => ["array of hosts"]
    cacert => 'cert'
    user => "user"
    password => "password"
  }
}

There is not solution. So I will reindex the whole database each time by using 2 indices. Removing one and filling the other one. Does someone have any idea, how I can do that?

Christian_Dahlqvist · July 5, 2023, 9:31am

No, UUID will not work. If you are not able to figure out which document to update there is no automatic way to do so. Reindexing the full dataset each time may therefore be your best option.

One way to do this is to create an alias that you query the data through. You can then create the new index in the background and switch the alias to point to the new version before you remove the old version. There is no automatic way to do this so you may need to create a script to handle the reindexing.

system · August 2, 2023, 9:31am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logstash adding duplicate rows for every run Logstash	11	14647	July 6, 2017
Alias-level unique document id? Elasticsearch	1	1053	September 19, 2017
Update document without id Logstash	5	673	May 27, 2019
I cannot create document Ids for my imported data Logstash	4	452	August 8, 2018
How to create unique id in logstash/elastic search for apache logs in a distributed server environment Logstash	10	3960	January 17, 2019

Can not create update index pipeline without unique identifier in database

Related topics