Parse sonarqube logs

Hi,

I have been using logstash for a month now and diving deep into its filters. I am facing issues with parsing sonarqube logs. I have configured filebeat to send 5 log files that are generated by sonarqube and added filter to differentiate them. I see the logs coming in kibana but I am not able to break the message using the grok filter. I check the grok pattern using the grok debugger and it works fine. I am not able to trace the issue of why the message field is not breaking fine. Below are the filebeat, logstash configurations and the message.

filebeat configs:

filebeat.prospectors:

  • type: log
    enabled: true
    paths:

    • /opt/sonarqube/logs/access.log
      fields:
      log_type: access
  • type: log
    enabled: true
    paths:

    • /opt/sonarqube/logs/web.log
      fields:
      log_type: web
  • type: log
    enabled: true
    paths:

    • /opt/sonarqube/logs/ce.log
      fields: {log_type: ce}
  • type: log
    enabled: true
    paths:

    • /opt/sonarqube/logs/es.log
      fields: {log_type: es}
  • type: log
    enabled: true
    paths:

    • /opt/sonarqube/logs/sonar.log
      fields: {log_type: sonar}
      ===================================

logstash configs:

input {
beats {
port => 5044
}
}

filter {
if [host] == "sonarqube" {
if [log_type] == "sonar,es,ce,web" {
grok {
match => { "message" => "%{YEAR:year}.%{MONTHNUM:month}.%{MONTHDAY:day} %{TIME} %{LOGLEVEL:loglevel} %{WORD:logtype}%{SYSLOG5424SD:emptybraces}%{SYSLOG5424SD:syslogclass} (?m)%{GREEDYDATA:log}" }
}
}
else if [log_type] == "access" {
grok {
match => { "message" => "%{IP:client_ip} %{USER:ident} %{USER:auth} [%{HTTPDATE:apache_timestamp}] "%{WORD:method} /%{NOTSPACE:request_page} HTTP/%{NUMBER:http_version}" %{NUMBER:server_response}" }
}
}
}
}

output {
stdout {
codec => rubydebug
}
if [host] in ["jenkins","sonarqube","artifactory"] {
elasticsearch {
hosts => "xx.xx.xxx.xx:9200"
manage_template => false
index => "%{[@metadata][beat]}-%{[host]}-%{[@metadata][version]}-%{+YYYY.MM.dd}"
}
}
}

message:
65.31.78.194 - - [01/Mar/2018:17:02:08 +0000] "GET /api/qualitygates/list HTTP/1.1" 200 62 "http://13.58.227.31:9000/quality_gates" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36" "AWHiK6vfp97mp7BlAACg"

If someone can please take a look at the issue. it will be a great help!
Thanks in advance

That should get you a syntax error, since you have double quotes embedded in a double quoted string. You also need to escape the square brackets. Try

"message" => '%{IP:client_ip} %{USER:ident} %{USER:auth} \[%{HTTPDATE:apache_timestamp}\] "%{WORD:method} /%{NOTSPACE:request_page} HTTP/%{NUMBER:http_version}" %{NUMBER:server_response}'
1 Like

Thank you @Badger
I have changed the pattern as you mentioned above but the output in kibana is still the same. I am not able to break the message variable. Please take a look at the image

There is no _grokparsefailure tag, which suggests your conditional logic is not even attempting the grok. In kibana you have a fields.log_type but in logstash you are testing log_type. I do not think those are the same thing.

1 Like

@Badger, I removed all the conditions from my filter and I am able to parse the logs as below.
{
"_index": "filebeat-sonarqube-6.2.2-2018.03.01",
"_type": "doc",
"_id": "jl8N42EBlicRN05UnlpX",
"_version": 1,
"_score": null,
"_source": {
"request_page": "api/navigation/global",
"http_version": "1.1",
"beat": {
"name": "sonarqube",
"hostname": "sonarqube",
"version": "6.2.2"
},
"tags": [
"beats_input_codec_plain_applied"
],
"apache_timestamp": "01/Mar/2018:19:32:15 +0000",
"source": "/opt/sonarqube/logs/access.log",
"host": "sonarqube",
"message": "65.31.78.194 - - [01/Mar/2018:19:32:15 +0000] "GET /api/navigation/global HTTP/1.1" 200 425 "http://13.58.227.31:9000/quality_gates" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36" "AWHiK6vfp97mp7BlAACt"",
"client_ip": "65.31.78.194",
"method": "GET",
"server_response": "200",
"auth": "-",
"@timestamp": "2018-03-01T19:32:23.374Z",
"offset": 51403,
"ident": "-",
"@version": "1",
"prospector": {
"type": "log"
},
"fields": {
"log_type": "access"
}
},
"fields": {
"@timestamp": [
"2018-03-01T19:32:23.374Z"
]
},
"sort": [
1519932743374
]
}

In my usecase, I am trying to get the logs from 5 different components and want to parse them. Can you please tell me what should be my approach for configuring logstash so that it can differentiate between different log files and different hostnames.

Also, how can I sync the logs according to the timestamp of the log_created rather then the time when the logstash pushed the logs to elasticsearch?

1 Like

In the filebeat documentation it says

By default, the fields that you specify here will be grouped under a fields sub-dictionary in the output document. To store the custom fields as top-level fields, set the fields_under_root option to true.

So when you do something like

- /opt/sonarqube/logs/access.log
  fields:
    log_type: access

it adds a field to the event called fields.log_type. You can test for that in logstash using

 if [fields][log_type] == "access"

or you can add fields_under_root to your filebeat configuration and test for

 if [log_type] == "access"
just as you were at first. If you want to test whether a field matches one of a set of values then instead of

if [log_type] == "sonar,es,ce,web" {

try

if [log_type] in [ "sonar", "es", "ce", "web"] {

Use a date filter to set @timestamp to the time in the message.

2 Likes

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.