Grok parsing issue on Bitbucket/Sonar API Logs


(Karan) #1

Hi,

In my logstash pipeline, I'm trying to use grok to segregate/filter out meaningful bits of information from the API call output logs to Tools like Bitbucket/SonarQube etc. to fetch metrics data into Elasticsearch

The pattern works fine on the grokdebug.herokuapp.com when I test it over here, however I get _grokparsefailures under tags on Kibana dashboard for this particular index:

API Call Output -->

{
"committer": {
"emailAddress": "xyz@aon.com",
"name": "XYZ"
},
"committerTimestamp": 1512496703000,
"author": {
"emailAddress": "xyz@aon.com",
"name": "XYZ"
},
"authorTimestamp": 1512456913000,
"id": "86620bfe84d0cf250a86189c7bac7c0433a9a056",
"displayId": "86620bfe84d",
"message": "Create",
"parents": [
{
"id": "a3a100c5ea865a813275759377f1b1d07e4f7db7",
"displayId": "a3a100c5ea8"
}
]
},

grok regex pattern for Above Log Output -->

{\n%{SPACE}%{SPACE}"%{GREEDYDATA:details}":%{SPACE}{\n%{SPACE}%{SPACE}%{SPACE}%{SPACE}"%{GREEDYDATA:Email}":%{SPACE}"%{GREEDYDATA:CommitterEmailAddress}",\n%{SPACE}%{SPACE}%{SPACE}%{SPACE}"%{GREEDYDATA:Name}":%{SPACE}"%{GREEDYDATA:CommitterName}"\n%{SPACE}%{SPACE}},\n%{SPACE}%{SPACE}"%{GREEDYDATA:timestamp}":%{SPACE}%{GREEDYDATA:CommitTimestamp},\n%{SPACE}%{SPACE}"%{GREEDYDATA:Author}":%{SPACE}{\n%{SPACE}%{SPACE}%{SPACE}%{SPACE}"%{GREEDYDATA:Email}":%{SPACE}"%{GREEDYDATA:AuthorEmailAddress}",\n%{SPACE}%{SPACE}%{SPACE}%{SPACE}"%{GREEDYDATA:AuthorName}":%{SPACE}"%{GREEDYDATA:AuthorsName}"\n%{SPACE}%{SPACE}},\n%{SPACE}%{SPACE}"%{GREEDYDATA:CommitDetails}":%{SPACE}%{GREEDYDATA:CommitTimestamp},\n%{SPACE}%{SPACE}"%{GREEDYDATA:CommitID}":%{SPACE}"%{GREEDYDATA:id},\n%{SPACE}%{SPACE}"%{GREEDYDATA:ShortHashID}":%{SPACE}%{GREEDYDATA:DisplayHash},\n%{SPACE}%{SPACE}"%{GREEDYDATA:CommitMessage}":%{SPACE}"%{GREEDYDATA:Comment}",\n%{SPACE}%{SPACE}"%{GREEDYDATA:CommitParent}":%{SPACE}[\n%{SPACE}%{SPACE}{\n%{SPACE}%{SPACE}%{SPACE}%{SPACE}%{SPACE}%{SPACE}"%{GREEDYDATA:ParentCommitdetails}":%{SPACE}"%{GREEDYDATA:ParentCommitID}",\n%{SPACE}%{SPACE}%{SPACE}%{SPACE}%{SPACE}%{SPACE}"%{GREEDYDATA:ParentCommitHashShort}":%{SPACE}"%{GREEDYDATA:ParentCommitShortHash}"\n%{SPACE}%{SPACE}%{SPACE}%{SPACE}}\n%{SPACE}%{SPACE}]\n},

Can someone help me if the grok pattern above can be optimized or something is amiss here.


(Christian Dahlqvist) #2

You should always try to use as specific patters as possible as using multiple DATA and/or GREEDYDATA can be very inefficient. If space is used as a separator, NOTSPACE might be a good substitute.


(Karan) #3

I'm new to groking...what specific patterns can I use here instead of the GREEDYDATA...the data that is fetched is via API calls pertaining to specific queries I made on Bitbucket

I've tried to achieve the same for other tools' API logs as well but to no avail, Fields break up properly in constructor but not when the pipelines execute.

Sonar API Logs -->

{
"metric": "duplicated_lines",
"periods": [
{
"index": 1,
"value": "0"
}
],
"value": "38404"
},

grok pattern for Above -->

{\n%{SPACE}%{SPACE}"%{GREEDYDATA:metric1_sonar},\n%{SPACE}%{SPACE}"periods":%{SPACE}[\n%{SPACE}%{SPACE}%{SPACE}%{SPACE}{\n%{SPACE}%{SPACE}%{SPACE}%{SPACE}%{SPACE}%{SPACE}"index":%{SPACE}1,\n%{SPACE}%{SPACE}%{SPACE}%{SPACE}%{SPACE}%{SPACE}"value":%{SPACE}"0"\n%{SPACE}%{SPACE}%{SPACE}%{SPACE}}\n%{SPACE}%{SPACE}],\n%{SPACE}%{SPACE}"value":%{SPACE}"%{GREEDYDATA:metric1_count}"\n},


(Christian Dahlqvist) #4

I am not clear on what the data looks like. If the data is in JSON format, use a JSON filter instead of grok. Grok is very flexible, but not the ideal choice for all types of data.


(Karan) #5

Yes. The API Calls output is in JSON format. Are there any specific patterns which can match to the above log output type w.r.t json filters in the pipelines & can I build the same in the grok debug/constructor utility ?


(Christian Dahlqvist) #6

If the data is in JSON, use the json filter, not grok.


(system) #7

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.