Fingerprint option in logstash filter not working properly?

I have used below code but its only inserting one record to ES .

filter {
  fingerprint {
    target => "document_id"
    method => "SHA256"
    key => "9ced3827c6a1c9dafac6da9abac41386ba1038ac95b3a865a0951bc2e948c58c"
  }
}

output {
  elasticsearch {
    host => localhost
    document_id => "%{document_id}"
  }
}

Please point out the mistake below I have approx 2 lac records in my sql which i am importing to ES , I want unique ids for each record using this fingerprint.

I don't know the answer but you can also not set the id by yourself and let Elasticsearch decide about it.

Hello,

Can you explain what is not working? What is your output and what do you expect?

Also, since you are not setting the source option in the fingerprint filter, it will use the message field, do you have a message field in the event? You didn't share the entire pipeline, so there is no way to know.

What is 2 lac? It is 2 k? 200 k? 2 millions?

1 Like

Can you please tell me how can i insert 2,00,000 records uniquely to ES using fingerprint?

Unless you have a message field to base the fingerprint on you need to specify the source field(s) to use in the plugin. It would help if you showed your full pipeline as well as a sample event.

Sample data -
MON_NO CODE SUM DATE YEAR_NO TIME_DOC
|APR|SUN0455|58639.5|2021-04-01 00:00:00|2021|2021-04-01 00:00:00|
|APR|SUN0455|58639.5|2021-04-01 00:00:00|2021|2021-04-01 00:00:00|
|MAY|SUN0457|639.5|2021-05-01 00:00:00|2021|2021-05-01 00:00:00|
|APR|SUN0455|58639.5|2021-04-01 00:00:00|2021|2021-04-01 00:00:00|
|APR|SUN0455|58639.5|2021-04-01 00:00:00|2021|2021-04-01 00:00:00|

input{
jdbc{
jdbc {
# Oracle
jdbc_driver_class => "Java::oracle.jdbc.driver.OracleDriver"
statement_filepath => "folio_aum.sql"

    }

}

filter {
fingerprint {
target => "document_id"
method => "SHA256"
key => "9ced3827c6a1c9dafac6da9abac41386ba1038ac95b3a865a0951bc2e948c58c"
}
}

output {
Elasticsearch {
host => localhost
document_id => "%{document_id}"
}
}

Please help in sending this data uniquely into ES

@Christian_Dahlqvist can you help as per below sample data i posted?

WhT does a sample document generated by the jdbc input plugin look like? Which of these fields can be used to uniquely identify an event/document?

@Christian_Dahlqvist JDBC output is same as I shared the sample data , I am basically getting the data from SQL and importing to ES, We can take any number of fields, like this I have 15 columns , How can I use fingerprint to generate a unique id for each record, as example we have UUID in python which generates unique id for each record, similarly how to use fingerprint to generate unique id?

Please show the sample document in JSON form as it looks when inserted into Elasticsearch. This will show the fields available, which depends on the SQL statement.

The fingerprint filter allows you to generate a hash based on some fields in the data that forms a primary key, which can be used as an ID. If you are running the JDBC query a number of times this will prevent duplicates from being created.

You could instead set the fingerprint plugin to generate a UUID, but that would be unique every time it runs for a record. If you reprocess the same data through the JDBC plugin that data would get inserted multiple times. Rather than using a UUID it would be more efficient to let Elasticsearch assign the ID.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.