Logstash fingerprint 7.1

I want to ensure that the document id are based on the input source. For this i'm using the fingerprint filter in logstash. In previous version i've used it as follows

fingerprint { 
	source => "message"
	target => "[@metadata][fingerprint]"
	method => "SHA1"
	key => "Websphere Metrics by Tran Class"
	base64encode => true
}

With the corresponding output section below:
output {
elasticsearch {
action => "index"
hosts => "localhost:9200"
index => "websphere"
document_id => "%{[@metadata][fingerprint]}"

    }

stdout {codec => rubydebug}

stdout {}

}
I have 4603 document i'm trying to ingest into the index websphere.
What I get now is one entry i.e. the last entry is the only one that survives.
If I remove the fingerprint , I get all document id.

1)Is this use of fingerprint not backwards compatible?
2) I have also tried to set the "concatenate_sources" => true with no change.

What is the proper way to do this duplicates removal in 7.1? Essentially i'm looking for the old behaviour in 6.0 for 7.1.

What do your message fields look like? Are they identical by chance?
Maybe you can include the timestamp of the message in the fingerprint?
You could give it an array of ["message","@timestamp"] and use concatenate_sources then?
Alternatively try concatenate_all_fields?

What is the document id on that entry?

The message field is input from a jdbc connection.
I have tried the concatenate_all_fields as well as concatenate_sources.
So I do one logstash run. Look at the discover panel and see that the last data point is made up of 78 entries. Then I run the same file again.
If fingerprint works , the counts for each entry should not change i.e. they will be simply written on toop of each other. Counts after second run : 156 i.e. 78*2.

The complete logstash file is shown below: # This config file is used to parse the sql data extracted from the db entry:PostalMod
input {
jdbc {
jdbc_driver_library => "/home/pxg110/sqljdbc_4.2/sqljdbc42.jar"
jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
jdbc_user => "CBSA_WLM_SVCg"
jdbc_password => "CBSA_WLM_SVCg"
lowercase_column_names => "false"
jdbc_connection_string => "jdbc:sqlserver://SD01CUVDB0521.OMEGA.DCE-EIR.NET:1433;"
statement => "SELECT ObsDate,ObsHour,TotalCPULoadMIPS,GPPLoadMIPS,zIIPLoadMIPS,GPPPathlenMilsInstr,zIIPPathlenMilsInstr,AvgNetworkTrafficKBsec FROM smg.dbo.smgdata WHERE (ApplnName='PostalMod') AND (ObsDate >= CONVERT(DATETIME, '2019-03-23', 102)) AND (ObsDate <= CONVERT(DATETIME, '2019-05-23', 102)) ORDER BY ObsDate;"
}
}
filter {
fingerprint {
source => "message"
target => "[@metadata][fingerprint]"
concatenate_all_fields => "true"
method => "SHA1"
key => "wed_jun_2019_11_58_postalmod"
base64encode => true
}

defines all the fields to be found in the csv file.

    csv {
            separator => ","
            columns =>    [
                    "ObsDate",
                    "ObsHour",
                    "TotalCPULoadMIPS",
                    "GPPLoadMIPS",
                    "zIIPLoadMIPS",
                    "GPPPathlenMilsInstr",
                    "zIIPPathlenMilsInstr",
                    "AvgNetworkTrafficKBsec"
            ]
            convert =>    {
                    "ObsDate"  =>  "date"
                    "ObsHour"  =>  "integer"
                    "TotalCPULoadMIPS"  =>  "float"
                    "GPPLoadMIPS"  =>  "float"
                    "zIIPLoadMIPS"  =>  "float"
                   "GPPPathlenMilsInstr"  =>  "float"
                    "zIIPPathlenMilsInstr"  =>  "float"
                    "AvgNetworkTrafficKBsec"  =>  "float"

            }
    }

A typical output is shown below:
{
"TotalCPULoadMIPS" => 29.07,
"AvgNetworkTrafficKBsec" => 1.98,
"zIIPPathlenMilsInstr" => 43.41,
"GPPLoadMIPS" => 0.175,
"zIIPLoadMIPS" => 28.89,
"@version" => "1",
"GPPPathlenMilsInstr" => 0.26,
"@timestamp" => 2019-03-24T18:00:00.000Z
}
The timestamp matches the date on the sql results. No datetimeparse error.

    mutate {
            convert => { "ObsDate" => "string" }
    }
    dissect {
            mapping => {
                    "ObsDate" => "%{year}-%{month}-%{day}T%{hour}:%{minute}:%{seconds}.%{ms}Z"
            }
    }

ISO time stamp 2011-04-19T03:44:01.103Z

    mutate {
            add_field => {  "timestamp" => "%{year}-%{month}-%{day}:%{ObsHour}"  }
    }
    date {
            match => [ "timestamp",  "yyyy-MM-dd:HH"]
            target => "@timestamp"
    }
    mutate {
            remove_field => [ "ObsDate", "ObsHour", "year","month", "day","hour","minute","seconds","ms","timestamp"]
    }

}

output {
elasticsearch {
action => "index"
hosts => "localhost:9200"
document_id => "%{[@metadata][fingerprint]}"
index => "wed_jun_2019_11_58_postalmod"
}
stdout {codec => rubydebug}
}

So, do you even have a message field?
Can you add a file output and check the output JSON for what exact data is extracted on your runs?

If you do not have a message field then I would expect all of the documents to have the document id "ovLqdkaUAOyjWzfsW9WXJqjwuew="

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.