Logstash fingerprint 7.1

pgervais · June 6, 2019, 1:53pm

I want to ensure that the document id are based on the input source. For this i'm using the fingerprint filter in logstash. In previous version i've used it as follows

fingerprint { 
	source => "message"
	target => "[@metadata][fingerprint]"
	method => "SHA1"
	key => "Websphere Metrics by Tran Class"
	base64encode => true
}

With the corresponding output section below:
output {
elasticsearch {
action => "index"
hosts => "localhost:9200"
index => "websphere"
document_id => "%{[@metadata][fingerprint]}"

stdout {codec => rubydebug}

stdout {}

}
I have 4603 document i'm trying to ingest into the index websphere.
What I get now is one entry i.e. the last entry is the only one that survives.
If I remove the fingerprint , I get all document id.

1)Is this use of fingerprint not backwards compatible?
2) I have also tried to set the "concatenate_sources" => true with no change.

What is the proper way to do this duplicates removal in 7.1? Essentially i'm looking for the old behaviour in 6.0 for 7.1.

BennyInc · June 6, 2019, 2:05pm

What do your message fields look like? Are they identical by chance?
Maybe you can include the timestamp of the message in the fingerprint?
You could give it an array of ["message","@timestamp"] and use concatenate_sources then?
Alternatively try concatenate_all_fields?

Badger · June 6, 2019, 3:33pm

What is the document id on that entry?

pgervais · June 6, 2019, 3:39pm

The message field is input from a jdbc connection.
I have tried the concatenate_all_fields as well as concatenate_sources.
So I do one logstash run. Look at the discover panel and see that the last data point is made up of 78 entries. Then I run the same file again.
If fingerprint works , the counts for each entry should not change i.e. they will be simply written on toop of each other. Counts after second run : 156 i.e. 78*2.

The complete logstash file is shown below: # This config file is used to parse the sql data extracted from the db entry:PostalMod
input {
jdbc {
jdbc_driver_library => "/home/pxg110/sqljdbc_4.2/sqljdbc42.jar"
jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
jdbc_user => "CBSA_WLM_SVCg"
jdbc_password => "CBSA_WLM_SVCg"
lowercase_column_names => "false"
jdbc_connection_string => "jdbc:sqlserver://SD01CUVDB0521.OMEGA.DCE-EIR.NET:1433;"
statement => "SELECT ObsDate,ObsHour,TotalCPULoadMIPS,GPPLoadMIPS,zIIPLoadMIPS,GPPPathlenMilsInstr,zIIPPathlenMilsInstr,AvgNetworkTrafficKBsec FROM smg.dbo.smgdata WHERE (ApplnName='PostalMod') AND (ObsDate >= CONVERT(DATETIME, '2019-03-23', 102)) AND (ObsDate <= CONVERT(DATETIME, '2019-05-23', 102)) ORDER BY ObsDate;"
}
}
filter {
fingerprint {
source => "message"
target => "[@metadata][fingerprint]"
concatenate_all_fields => "true"
method => "SHA1"
key => "wed_jun_2019_11_58_postalmod"
base64encode => true
}

defines all the fields to be found in the csv file.

    csv {
            separator => ","
            columns =>    [
                    "ObsDate",
                    "ObsHour",
                    "TotalCPULoadMIPS",
                    "GPPLoadMIPS",
                    "zIIPLoadMIPS",
                    "GPPPathlenMilsInstr",
                    "zIIPPathlenMilsInstr",
                    "AvgNetworkTrafficKBsec"
            ]
            convert =>    {
                    "ObsDate"  =>  "date"
                    "ObsHour"  =>  "integer"
                    "TotalCPULoadMIPS"  =>  "float"
                    "GPPLoadMIPS"  =>  "float"
                    "zIIPLoadMIPS"  =>  "float"
                   "GPPPathlenMilsInstr"  =>  "float"
                    "zIIPPathlenMilsInstr"  =>  "float"
                    "AvgNetworkTrafficKBsec"  =>  "float"

            }
    }

A typical output is shown below:
{
"TotalCPULoadMIPS" => 29.07,
"AvgNetworkTrafficKBsec" => 1.98,
"zIIPPathlenMilsInstr" => 43.41,
"GPPLoadMIPS" => 0.175,
"zIIPLoadMIPS" => 28.89,
"@version" => "1",
"GPPPathlenMilsInstr" => 0.26,
"@timestamp" => 2019-03-24T18:00:00.000Z
}
The timestamp matches the date on the sql results. No datetimeparse error.

    mutate {
            convert => { "ObsDate" => "string" }
    }
    dissect {
            mapping => {
                    "ObsDate" => "%{year}-%{month}-%{day}T%{hour}:%{minute}:%{seconds}.%{ms}Z"
            }
    }

ISO time stamp 2011-04-19T03:44:01.103Z

    mutate {
            add_field => {  "timestamp" => "%{year}-%{month}-%{day}:%{ObsHour}"  }
    }
    date {
            match => [ "timestamp",  "yyyy-MM-dd:HH"]
            target => "@timestamp"
    }
    mutate {
            remove_field => [ "ObsDate", "ObsHour", "year","month", "day","hour","minute","seconds","ms","timestamp"]
    }

}

output {
elasticsearch {
action => "index"
hosts => "localhost:9200"
document_id => "%{[@metadata][fingerprint]}"
index => "wed_jun_2019_11_58_postalmod"
}
stdout {codec => rubydebug}
}

BennyInc · June 6, 2019, 4:04pm

So, do you even have a message field?
Can you add a file output and check the output JSON for what exact data is extracted on your runs?

Badger · June 6, 2019, 4:48pm

If you do not have a message field then I would expect all of the documents to have the document id "ovLqdkaUAOyjWzfsW9WXJqjwuew="

system · July 4, 2019, 4:56pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logstash Fingerprint Issue Logstash	1	349	November 5, 2020
Fingerprint does not work as expected Elasticsearch	3	1103	June 1, 2016
Logstash fingerprint not able to remove duplicate Logstash	8	1646	October 22, 2019
[Solved] Fingerprint does not work as expected II Logstash	7	2428	July 6, 2017
Why does it have duplicate ID with normal logstash output and fingerprint filter? Logstash	2	323	March 21, 2021

Logstash fingerprint 7.1

stdout {codec => rubydebug}

stdout {}

defines all the fields to be found in the csv file.

ISO time stamp 2011-04-19T03:44:01.103Z

Related topics