Grok parsing issue on Bitbucket/Sonar API Logs

karanarora · January 2, 2018, 7:43am

Hi,

In my logstash pipeline, I'm trying to use grok to segregate/filter out meaningful bits of information from the API call output logs to Tools like Bitbucket/SonarQube etc. to fetch metrics data into Elasticsearch

The pattern works fine on the grokdebug.herokuapp.com when I test it over here, however I get _grokparsefailures under tags on Kibana dashboard for this particular index:

API Call Output -->

{
"committer": {
"emailAddress": "xyz@aon.com",
"name": "XYZ"
},
"committerTimestamp": 1512496703000,
"author": {
"emailAddress": "xyz@aon.com",
"name": "XYZ"
},
"authorTimestamp": 1512456913000,
"id": "86620bfe84d0cf250a86189c7bac7c0433a9a056",
"displayId": "86620bfe84d",
"message": "Create",
"parents": [
{
"id": "a3a100c5ea865a813275759377f1b1d07e4f7db7",
"displayId": "a3a100c5ea8"
}
]
},

grok regex pattern for Above Log Output -->

{\n%{SPACE}%{SPACE}"%{GREEDYDATA:details}":%{SPACE}{\n%{SPACE}%{SPACE}%{SPACE}%{SPACE}"%{GREEDYDATA:Email}":%{SPACE}"%{GREEDYDATA:CommitterEmailAddress}",\n%{SPACE}%{SPACE}%{SPACE}%{SPACE}"%{GREEDYDATA:Name}":%{SPACE}"%{GREEDYDATA:CommitterName}"\n%{SPACE}%{SPACE}},\n%{SPACE}%{SPACE}"%{GREEDYDATA:timestamp}":%{SPACE}%{GREEDYDATA:CommitTimestamp},\n%{SPACE}%{SPACE}"%{GREEDYDATA:Author}":%{SPACE}{\n%{SPACE}%{SPACE}%{SPACE}%{SPACE}"%{GREEDYDATA:Email}":%{SPACE}"%{GREEDYDATA:AuthorEmailAddress}",\n%{SPACE}%{SPACE}%{SPACE}%{SPACE}"%{GREEDYDATA:AuthorName}":%{SPACE}"%{GREEDYDATA:AuthorsName}"\n%{SPACE}%{SPACE}},\n%{SPACE}%{SPACE}"%{GREEDYDATA:CommitDetails}":%{SPACE}%{GREEDYDATA:CommitTimestamp},\n%{SPACE}%{SPACE}"%{GREEDYDATA:CommitID}":%{SPACE}"%{GREEDYDATA:id},\n%{SPACE}%{SPACE}"%{GREEDYDATA:ShortHashID}":%{SPACE}%{GREEDYDATA:DisplayHash},\n%{SPACE}%{SPACE}"%{GREEDYDATA:CommitMessage}":%{SPACE}"%{GREEDYDATA:Comment}",\n%{SPACE}%{SPACE}"%{GREEDYDATA:CommitParent}":%{SPACE}[\n%{SPACE}%{SPACE}{\n%{SPACE}%{SPACE}%{SPACE}%{SPACE}%{SPACE}%{SPACE}"%{GREEDYDATA:ParentCommitdetails}":%{SPACE}"%{GREEDYDATA:ParentCommitID}",\n%{SPACE}%{SPACE}%{SPACE}%{SPACE}%{SPACE}%{SPACE}"%{GREEDYDATA:ParentCommitHashShort}":%{SPACE}"%{GREEDYDATA:ParentCommitShortHash}"\n%{SPACE}%{SPACE}%{SPACE}%{SPACE}}\n%{SPACE}%{SPACE}]\n},

Can someone help me if the grok pattern above can be optimized or something is amiss here.

Christian_Dahlqvist · January 2, 2018, 8:44am

You should always try to use as specific patters as possible as using multiple DATA and/or GREEDYDATA can be very inefficient. If space is used as a separator, NOTSPACE might be a good substitute.

karanarora · January 2, 2018, 10:40am

I'm new to groking...what specific patterns can I use here instead of the GREEDYDATA...the data that is fetched is via API calls pertaining to specific queries I made on Bitbucket

I've tried to achieve the same for other tools' API logs as well but to no avail, Fields break up properly in constructor but not when the pipelines execute.

Sonar API Logs -->

{
"metric": "duplicated_lines",
"periods": [
{
"index": 1,
"value": "0"
}
],
"value": "38404"
},

grok pattern for Above -->

{\n%{SPACE}%{SPACE}"%{GREEDYDATA:metric1_sonar},\n%{SPACE}%{SPACE}"periods":%{SPACE}[\n%{SPACE}%{SPACE}%{SPACE}%{SPACE}{\n%{SPACE}%{SPACE}%{SPACE}%{SPACE}%{SPACE}%{SPACE}"index":%{SPACE}1,\n%{SPACE}%{SPACE}%{SPACE}%{SPACE}%{SPACE}%{SPACE}"value":%{SPACE}"0"\n%{SPACE}%{SPACE}%{SPACE}%{SPACE}}\n%{SPACE}%{SPACE}],\n%{SPACE}%{SPACE}"value":%{SPACE}"%{GREEDYDATA:metric1_count}"\n},

Christian_Dahlqvist · January 2, 2018, 11:39am

I am not clear on what the data looks like. If the data is in JSON format, use a JSON filter instead of grok. Grok is very flexible, but not the ideal choice for all types of data.

karanarora · January 2, 2018, 3:17pm

Yes. The API Calls output is in JSON format. Are there any specific patterns which can match to the above log output type w.r.t json filters in the pipelines & can I build the same in the grok debug/constructor utility ?

Christian_Dahlqvist · January 2, 2018, 6:04pm

If the data is in JSON, use the json filter, not grok.

system · January 30, 2018, 6:05pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logstash pipeline grok issue with regex Logstash	25	2194	August 17, 2022
Parse sonarqube logs Logstash	6	4217	March 29, 2018
Combined grok pattern for customized logs Logstash	21	1559	September 5, 2023
Grok filter is not working properly Logstash	10	1306	May 19, 2020
Logstash - grokfilter is successful using grokdebug, but fails in live Logstash	14	649	September 6, 2021

Grok parsing issue on Bitbucket/Sonar API Logs

Related topics