Logstash Fingerprint Issue

Hi..
I have been having issue with the logstash configuration. I have set the configuration such that fingerprint of few fields is considered as doc_id but The ID is not getting populated even tho hash are coming correct. And logstash config is such that it updates the document if same doc_id is used for ingestion. In my case almost all the document is getting same ID so that ID is updated constantly. There are close to 15000 unique rows in the file. But only 6 are pushed (Considering majority is going in same ID and Updated with the last row). I have done similar config for other logs but they are working fine only one log I am having issue with.
Here is the snippet of the filter config for the log.

    if [log_type] == "XXXXXXX" {
        grok {
            match => { "message" => "%{TIMESTAMP_ISO8601:Activity-Time}\,PAGE=%{QUOTEDSTRING:PAGE}\,DESCRIPTION=%{QUOTEDSTRING:PAGEDESCRIPTION}\,USAGE=%{QUOTEDSTRING:PAGEUSAGE}\,PPI_LEVEL=%{QUOTEDSTRING:PPI_LEVEL}" }
        }
        mutate { gsub => [ "Activity-Time", "\,.*", "" ] }
        mutate { gsub => [ "PPI_LEVEL", "^\"+|\"+$", "" ] }
        mutate { gsub => [ "PAGEUSAGE", "^\"+|\"+$", "" ] }
        mutate { gsub => [ "PAGEDESCRIPTION", "^\"+|\"+$", "" ] }
        mutate { gsub => [ "PAGE", "^\"+|\"+$", "" ] }
        elasticsearch {
            hosts => ["elasticsearch:9200"]
            user => "xxxxxxx"
            password => "xxxxxxx"
            query => "USAGE:%{[PAGEUSAGE]}"
            index => "Some-other-logs"
            fields => {
                "APPLICATION" => "APPLICATION"
            }
        }
        fingerprint {
            key => "xxxxxx"
            method => "SHA1"
            source => ["PAGE", "PAGEDESCRIPTION", "PAGEUSAGE", "PPI_LEVEL"] 
            target => "document_hash_value"
        }
    }

The Output block is as below.

    output {
        if [log_type] in [ <list_of_logs_names> ] {
            elasticsearch {
                hosts => ["elasticsearch:9200"]
                ilm_enabled => "false"
                user => "xxxxxxx"
                password => "XXXXXXX"
                index => "%{log_type}-index"
                action => "update"
                doc_as_upsert => "true"
                document_id => "%{[document_hash_value]}"

            }
        }
        else {
            elasticsearch {
                hosts => ["elasticsearch:9200"]
                ilm_enabled => "false"
                user => "XXXXXXX"
                password => "XXXXXXX"
                index => "%{log_type}-index"
            }
        }
        stdout { codec => rubydebug }
    }

Below is the sample logs which I am trying to parse. All Rows are unique but need to have the config such that duplicates don't get populated if they come in future.

    2020-08-24 06:02:40,PAGE="ABSENCE_HISTORY2",DESCRIPTION="General Abs. Follow-up Action",USAGE="HABS",PPI_LEVEL="3"

    2020-08-24 06:02:40,PAGE="ABSENCE_HISTORY3",DESCRIPTION="General Absence Comments",USAGE="HABS",PPI_LEVEL="2"

    2020-08-24 06:02:40,PAGE="ABSENCE_VACATION",DESCRIPTION="Vacation Absence",USAGE="HABS",PPI_LEVEL="3"

    2020-08-24 06:02:40,PAGE="ABSV_PLANS",DESCRIPTION="Vacation Plan",USAGE="HABS",PPI_LEVEL="3"

    2020-08-24 06:02:40,PAGE="ABSV_PLAN_TABLE",DESCRIPTION="Vacation Plan Table",USAGE="HABS",PPI_LEVEL="3"

    2020-08-24 06:02:40,PAGE="ABSV_REQUEST",DESCRIPTION="Vacation Request",USAGE="HABS",PPI_LEVEL="3"

    2020-08-24 06:02:40,PAGE="ABSV_REQ_SEC",DESCRIPTION="Vacation Request Approval",USAGE="HABS",PPI_LEVEL="3"

    2020-08-24 06:02:40,PAGE="ABSW_SCHD_TABLE",DESCRIPTION="Work Schedule Table",USAGE="HABS",PPI_LEVEL="3"

    2020-08-24 06:02:40,PAGE="ABSW_SCHEDULE",DESCRIPTION="EE Work/Holiday Schedule",USAGE="HABS",PPI_LEVEL="3"

    2020-08-24 06:02:40,PAGE="ABSW_TMPL_TABLE",DESCRIPTION="Work Template Table",USAGE="HABS",PPI_LEVEL="3"
    .
    .
    .
    .
    15000 more lines

The ingestion logs are having same(99% of the logs) doc ID : 3262641910c78d40fa14396c0000000c
I can't share logs as they contain a lot of sensitive meta data info.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.