Logstash Grok - duplicate entries after parsing

I have a peculiar situation where , the grok match parsing is putting duplicate entries after parsing.

I have tried break_on_match => true ( with true -> it is not able to parse the required fields), on false, it is putting duplicate entries.

The below is the grok pattern used"

Logstash Grok pattern

if [fields][app] == "xyz" {

    grok {
       patterns_dir => ["/usr/share/logstash/patterns"]
       match => {"message" => "%{COMBINEDAPACHELOG} %{QS:trueclientip} %{QS:filetype}"}
       match => { "message" => "%{COMBINEDAPACHELOG} \"%{IP:trueclientip}\" %{QS:filetype}" }
       break_on_match => false
    }

JSON output

{
"_index": "XYZ-prod-cons-adc-ohs-2018.02.19",
"_type": "ohs",
"_id": "AWGu1kCJ_An67UGq4G4Q",
"_version": 1,
"_score": null,
"_source": {
"request": [
"/content/web/HRPAGE123",
"/content/web/HRPAGE123"
],
"filetype": [
""image/jpeg"",
""image/jpeg""
],
"agent": [
"Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko",
"Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko"
],
"client_timezone": " North America Central time(CT)",
"auth": [
"john.k@xyz.com",
"john.k@xyz.com"
],
"ident": [
"-",
"-"
],
"tz": [
"-0600",
"-0600"
],
"trueclientip": [
"100.17.21.74",
"100.17.21.74"
],
"cl-geoip": {
"timezone": "America/Los_Angeles",
"continent_code": "NA",
"city_name": "Redwood City",
"country_name": "United States",
"country_code2": "US",
"dma_code": 807,
"country_code3": "US",
"region_name": "California",
"postal_code": "94065",
"region_code": "CA"
},
"tcl-geoip": {},
"clientip": [
"10.17.21.74",
"10.17.21.74"
],
"@version": "1",
"host": "XYZserver1",
"referrer_host": [
"XYZ.com",
"XYZ.com"
],
"verb": [
"GET",
"GET"
],
"message": "10.17.21.74 - - [15/Feb/2018:04:28:59 -0600] "GET /content/web/HRPAGE123 HTTP/1.1" 200 28085 "https://XYZ.com/site/fin/gfo/GlobalProcesses/GPTD/XyZ4358.html" "Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko" "100.17.21.74" "image/jpeg" ",
"tags": [
"beats_input_codec_plain_applied",
"_grokparsefailure",
"_dateparsefailure",
"_geoip_lookup_failure"
],
"referrer": [
"https://XYZ.com/site/fin/gfo/GlobalProcesses/GPTD/XyZ4358.html",
"https://XYZ.com/site/fin/gfo/GlobalProcesses/GPTD/XyZ4358.html"
],
"@timestamp": "2018-02-19T15:43:54.712Z",
"filename": [
"HRPAGE123",
"HRPAGE123"
],
"response": [
"200",
"200"
],
"bytes": [
"28085",
"28085"
],
"httpversion": [
"1.1",
"1.1"
],
"fields": {
"app": "XYZ",
"log_type": "ohs",
"tier": "cons",
"lc": "prod",
"property": "XYZ.com",
"dc": "adc"
},
"lob": [
"fin",
"fin"
]
},
"fields": {
"@timestamp": [
1519055034712
]
},
"sort": [
1519055034712
]
}

Please let me know , why is this happening?

How can I avoid this situation?

I suspect this is because you have not configured this as an array of index patterns as described in the docs. As the first pattern seems to match anything the second one does, you might actually be able to simply remove the second pattern.

Both the patterns are the same.

When I have the first pattern , it is not getting parsed, but when I put both the patterns , both are getting parsed against the same log format record and putting 2 entries for each field. Which is quite strange.

And break_on_match => true is also not working.

Is the following basic example not working?

grok {
  match => {"message" => "%{COMBINEDAPACHELOG} %{QS:trueclientip} %{QS:filetype}"}
}

Can you configure a stdout output plugin with a rubydebug codec and show the result for an event?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.