I have a peculiar situation where , the grok match parsing is putting duplicate entries after parsing.
I have tried break_on_match => true ( with true -> it is not able to parse the required fields), on false, it is putting duplicate entries.
The below is the grok pattern used"
Logstash Grok pattern
if [fields][app] == "xyz" {
grok {
patterns_dir => ["/usr/share/logstash/patterns"]
match => {"message" => "%{COMBINEDAPACHELOG} %{QS:trueclientip} %{QS:filetype}"}
match => { "message" => "%{COMBINEDAPACHELOG} \"%{IP:trueclientip}\" %{QS:filetype}" }
break_on_match => false
}
JSON output
{
"_index": "XYZ-prod-cons-adc-ohs-2018.02.19",
"_type": "ohs",
"_id": "AWGu1kCJ_An67UGq4G4Q",
"_version": 1,
"_score": null,
"_source": {
"request": [
"/content/web/HRPAGE123",
"/content/web/HRPAGE123"
],
"filetype": [
""image/jpeg"",
""image/jpeg""
],
"agent": [
"Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko",
"Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko"
],
"client_timezone": " North America Central time(CT)",
"auth": [
"john.k@xyz.com",
"john.k@xyz.com"
],
"ident": [
"-",
"-"
],
"tz": [
"-0600",
"-0600"
],
"trueclientip": [
"100.17.21.74",
"100.17.21.74"
],
"cl-geoip": {
"timezone": "America/Los_Angeles",
"continent_code": "NA",
"city_name": "Redwood City",
"country_name": "United States",
"country_code2": "US",
"dma_code": 807,
"country_code3": "US",
"region_name": "California",
"postal_code": "94065",
"region_code": "CA"
},
"tcl-geoip": {},
"clientip": [
"10.17.21.74",
"10.17.21.74"
],
"@version": "1",
"host": "XYZserver1",
"referrer_host": [
"XYZ.com",
"XYZ.com"
],
"verb": [
"GET",
"GET"
],
"message": "10.17.21.74 - - [15/Feb/2018:04:28:59 -0600] "GET /content/web/HRPAGE123 HTTP/1.1" 200 28085 "https://XYZ.com/site/fin/gfo/GlobalProcesses/GPTD/XyZ4358.html" "Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko" "100.17.21.74" "image/jpeg" ",
"tags": [
"beats_input_codec_plain_applied",
"_grokparsefailure",
"_dateparsefailure",
"_geoip_lookup_failure"
],
"referrer": [
"https://XYZ.com/site/fin/gfo/GlobalProcesses/GPTD/XyZ4358.html",
"https://XYZ.com/site/fin/gfo/GlobalProcesses/GPTD/XyZ4358.html"
],
"@timestamp": "2018-02-19T15:43:54.712Z",
"filename": [
"HRPAGE123",
"HRPAGE123"
],
"response": [
"200",
"200"
],
"bytes": [
"28085",
"28085"
],
"httpversion": [
"1.1",
"1.1"
],
"fields": {
"app": "XYZ",
"log_type": "ohs",
"tier": "cons",
"lc": "prod",
"property": "XYZ.com",
"dc": "adc"
},
"lob": [
"fin",
"fin"
]
},
"fields": {
"@timestamp": [
1519055034712
]
},
"sort": [
1519055034712
]
}
Please let me know , why is this happening?
How can I avoid this situation?