Fingerprint option in logstash filter not working properly?

Anjali_Kushwaha · January 22, 2022, 12:23pm

I have used below code but its only inserting one record to ES .

filter {
  fingerprint {
    target => "document_id"
    method => "SHA256"
    key => "9ced3827c6a1c9dafac6da9abac41386ba1038ac95b3a865a0951bc2e948c58c"
  }
}

output {
  elasticsearch {
    host => localhost
    document_id => "%{document_id}"
  }
}

Please point out the mistake below I have approx 2 lac records in my sql which i am importing to ES , I want unique ids for each record using this fingerprint.

dadoonet · January 22, 2022, 1:27pm

I don't know the answer but you can also not set the id by yourself and let Elasticsearch decide about it.

leandrojmp · January 22, 2022, 2:14pm

Hello,

Can you explain what is not working? What is your output and what do you expect?

Also, since you are not setting the source option in the fingerprint filter, it will use the message field, do you have a message field in the event? You didn't share the entire pipeline, so there is no way to know.

What is 2 lac? It is 2 k? 200 k? 2 millions?

Anjali_Kushwaha · January 24, 2022, 4:31am

Can you please tell me how can i insert 2,00,000 records uniquely to ES using fingerprint?

Christian_Dahlqvist · January 24, 2022, 4:47am

Unless you have a message field to base the fingerprint on you need to specify the source field(s) to use in the plugin. It would help if you showed your full pipeline as well as a sample event.

Anjali_Kushwaha · January 24, 2022, 4:59am

Sample data -
MON_NO CODE SUM DATE YEAR_NO TIME_DOC
|APR|SUN0455|58639.5|2021-04-01 00:00:00|2021|2021-04-01 00:00:00|
|APR|SUN0455|58639.5|2021-04-01 00:00:00|2021|2021-04-01 00:00:00|
|MAY|SUN0457|639.5|2021-05-01 00:00:00|2021|2021-05-01 00:00:00|
|APR|SUN0455|58639.5|2021-04-01 00:00:00|2021|2021-04-01 00:00:00|
|APR|SUN0455|58639.5|2021-04-01 00:00:00|2021|2021-04-01 00:00:00|

input{
jdbc{
jdbc {
# Oracle
jdbc_driver_class => "Java::oracle.jdbc.driver.OracleDriver"
statement_filepath => "folio_aum.sql"

}

filter {
fingerprint {
target => "document_id"
method => "SHA256"
key => "9ced3827c6a1c9dafac6da9abac41386ba1038ac95b3a865a0951bc2e948c58c"
}
}

output {
Elasticsearch {
host => localhost
document_id => "%{document_id}"
}
}

Please help in sending this data uniquely into ES

Anjali_Kushwaha · January 24, 2022, 5:01am

@Christian_Dahlqvist can you help as per below sample data i posted?

Christian_Dahlqvist · January 24, 2022, 5:02am

WhT does a sample document generated by the jdbc input plugin look like? Which of these fields can be used to uniquely identify an event/document?

Anjali_Kushwaha · January 24, 2022, 5:08am

@Christian_Dahlqvist JDBC output is same as I shared the sample data , I am basically getting the data from SQL and importing to ES, We can take any number of fields, like this I have 15 columns , How can I use fingerprint to generate a unique id for each record, as example we have UUID in python which generates unique id for each record, similarly how to use fingerprint to generate unique id?

Christian_Dahlqvist · January 24, 2022, 5:39am

Please show the sample document in JSON form as it looks when inserted into Elasticsearch. This will show the fields available, which depends on the SQL statement.

The fingerprint filter allows you to generate a hash based on some fields in the data that forms a primary key, which can be used as an ID. If you are running the JDBC query a number of times this will prevent duplicates from being created.

You could instead set the fingerprint plugin to generate a UUID, but that would be unique every time it runs for a record. If you reprocess the same data through the JDBC plugin that data would get inserted multiple times. Rather than using a UUID it would be more efficient to let Elasticsearch assign the ID.

system · February 21, 2022, 5:40am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Fingerprint issue Logstash	3	1410	July 6, 2017
I cannot create document Ids for my imported data Logstash	4	486	August 8, 2018
Logstash fingerprint 7.1 Logstash	6	1443	July 4, 2019
Fingerprint does not work as expected Elasticsearch	3	1100	June 1, 2016
Logstash produces duplicates Logstash	3	1182	July 6, 2017

Fingerprint option in logstash filter not working properly?

Related topics