How to use the JSON filter correctly?

Application logs is of below JSON format and I'm unsure what should be the source field incase I'm using the JSON filter ?

I would like to have all the fields appear on the Kibana output, particularly the message field, but as per the official documentation - " For example, if you have JSON data in the message field: The above would parse the JSON from the message field. "
but I don't have a JSON in my message or any other field but the source is a required setting as well.

How should I handle this ? Thanks in advance.


{"asctime": "2023-07-26 09:05:08,271", "levelname": "INFO", "pathname": "/app/realtime_processor/parallel_processor.py", "lineno": 280, "message": "comment_body: **SonarQube Quality Gate**\n\n[![Quality Gate passed](https://... does not match defined phrase: rerun ingress-garuda-prod", "namespace": "challenger-rover", "version": "jfrog.artifactory:5060/drpbuilds/list-groovy-real:583eb6a", "pod": "ingress-garuda-prod-647c5d755-7jmwx"}

Logstash config -

input {
  http {
    port => 9200
  }
}

filter {
    json {
       source => "message"
       remove_field => ["message"]
    }
    mutate {
      remove_field => [ "_index", "_score" ]
    }
}

output {
     elasticsearch {
       id => "elk-log-setup"
       hosts => ["<HOSTNAME>:9200"]
       index => "logstash-%{+YYYY.MM.dd}"
    }
 }

What is your input? You didn't say what is the input you are using.

You probably do. When an event enters the logstash pipeline it will be stored on a field named message, so you will probably have this:

{
  "message": "{\"asctime\": \"2023-07-26 09:05:08,271\", \"levelname\": \"INFO\", \"pathname\": \"/app/realtime_processor/parallel_processor.py\", \"lineno\": 280, \"message\": \"comment_body: **SonarQube Quality Gate**\\n\\n[![Quality Gate passed](https://... does not match defined phrase: rerun ingress-garuda-prod\", \"namespace\": \"challenger-rover\", \"version\": \"jfrog.artifactory:5060/drpbuilds/list-groovy-real:583eb6a\", \"pod\": \"ingress-garuda-prod-647c5d755-7jmwx\"}"
}

In this case the content of the message field is your original json log, to parse it you need to use the following filter:

filter {
  json {
    source => "message"
  }
}

Since the event in logstash already has a field named message, with your original log, the json filter will override this field with the value of the message field in your log.

@leandrojmp : My bad, I've added the logstash config to the question.

"the json filter will override this field with the value of the message field in your log."

  • Can you please elaborate a bit, you mean the message field from my original log, will be retained if I use this config ?
filter {
  json {
    source => "message"
  }
}

Also, all the original fields will be seen in the output (kibana) ?

Thanks !

Yes, this is what I said, the message field inside your log will override the message field created by logstash when the event entered the pipeline when the json filter is executed.

Have you tried to see what are the results you will get? The best way to see what you will get is just testing the pipeline.

Thank you @leandrojmp .
Yes, I've placed a request to test the configuration, will let you know how it goes :slight_smile:

Thanks again.

@leandrojmp : Strange behavior is that, log lines starting with "source" appears on the Kibana dashboard, while the log lines starting with "message" are totally missing on the Kibana dashboard.

I'm stumped here, what could be the reason ?

Present (seen on Kibana dashboard)
{"source": "nginx", "time": 1690544622.749, "resp_body_size": 58, "host": "business.org.dev.com", "address": "192.10.3.09", "request_length": 375, "method": "GET", "path": "/api/v1/", "status_code": 401, "user_agent": "check_http/v2.3.3 (nagios-plugins 2.3.3)", "resp_time": 0.003, "upstream_addr": "unix:/tmp/gunicorn.sock"}

Absent(not available on Kibana)

{"message": "Not Found: /api/v1/products/undefined/job_status/", "status_code": 404, "request": "<WSGIRequest: GET '/api/v1/products/undefined/job_status/'>", "request_id": "57de5b467fdc4cb6b4748c0497a271f7", "level": "WARNING", "lineno": 224, "filename": "validate.py", "created": 1690544635.1029072, "name": "django.request", "threadName": "MainThread", "module": "log", "timestamp": "2023-07-28T11:43:55.102995Z"}

{"message": "Exception with 401 response", "exc_info": "Traceback (most recent call last):\n  File \"/usr/local/lib/python3.8/site-packages/rest_framework/views.py\", line 497, in dispatch\n    self.initial(request, *args, **kwargs)\n  File \"/usr/local/lib/python3.8/site-packages/rest_framework/views.py\", line 415, in initial\n    self.check_permissions(request)\n  File \"/usr/local/lib/python3.8/site-packages/rest_framework/views.py\", line 333, in check_permissions\n    self.permission_denied(\n  File \"/usr/local/lib/python3.8/site-packages/rest_framework/views.py\", line 174, in permission_denied\n    raise exceptions.NotAuthenticated()\nrest_framework.exceptions.NotAuthenticated: Authentication credentials were not provided.", "exc_type": "NotAuthenticated", "exc_message": "Authentication credentials were not provided.", "exc_args": [], "exc_filename": "/usr/local/lib/python3.8/site-packages/rest_framework/views.py", "exc_module": "views.py", "exc_lineno": 174, "status_code": 401, "req_method": "GET", "req_path": "/api/v1/", "request_id": "e173fd9dde714660bf09bf1273b57f96", "level": "WARNING", "lineno": 55, "filename": "exceptions.py", "created": 1690544622.748727, "name": "business.core.exceptions", "threadName": "MainThread", "module": "exceptions", "timestamp": "2023-07-28T11:43:42.748939Z"}

{"message": "Role for", "role": ["read", "transfer"], "permitted": true, "request_id": "be64716819a53a89j8765tdbv4sd", "level": "INFO", "username": "roney cole", "lineno": 131, "filename": "validations.py", "created": 1690544615.5304928, "name": "business.core.validations", "threadName": "MainThread", "module": "validations", "timestamp": "2023-07-28T11:43:35.530529Z"}

It is not clear what is the issue, the entire log is missing or just the message field from those logs?

If it is just the message field, it is probably because of this:

    json {
       source => "message"
       remove_field => ["message"]
    }

Yu have a remove_field for the message field, normally this will remove the source message, but since your source message also has a message field, it will remove this field.

Remove this remove_field from your json filter.

@leandrojmp - Entire Log lines starting with "message" are missing.
E.g.
{"message": "Role for", "role": ["read", "transfer"], "permitted": true, "request_id": "be64716819a53a89j8765tdbv4sd", "level": "INFO", "username": "roney cole", "lineno": 131, "filename": "validations.py", "created": 1690544615.5304928, "name": "business.core.validations", "threadName": "MainThread", "module": "validations", "timestamp": "2023-07-28T11:43:35.530529Z"}

filter configuration is

filter {
    json {
       source => "message"
     }
    mutate {
      remove_field => [ "_id", "headers", "_type", "character", "@version", "_index", "_score", "message" ]
    }
}

If I don't remove the message field, the entire log line or document gets into the "message field" thrice.

I mean , same log lines gets repeated on the message field three times.

@leandrojmp : I have attached couple of screenshots of one of the document/event from the dashboard of one of the service.
I hope that provides more context rather than my explanation :slight_smile:

Well, this is really confusing.

What is the source of your message? This looks way different from the other examples you shared.

The message in your screenshot looks like a request that you would make to elasticsearch, it is not clear why you are sending this to logstash, can you provide more context?

But if the entire log is missing, did you check logstash logs to see if there is any error?

What is the source of your message? This looks way different from the other examples you shared.

I'm exactly sure what that means, I use fluentd to aggregate logs and funnel that to logstash.
On logstash config, http input plugin is used.

The example logs shared are of same format to that of my production logs.

Below warnings are seen in the logstash logs

[2023-07-20T00:06:30,436][WARN ][logstash.filters.json    ] Error parsing json {:source=>"message", :raw=>"INFO |2023-07-20 00:04:29 |1 | manageQueue.py 335 | Scan details: {'main': {}}", :exception=>#<LogStash::Json::ParserError: Unrecognized token 'INFO': was expecting 'null', 'true', 'false' or NaN
 at [Source: (byte[])"INFO |2023-07-20 00:04:29 |1 | manageQueue.py 335 | Scan details: {'main': {}}"; line: 1, column: 6]>}
[2023-07-20T00:06:35,900][WARN ][logstash.filters.json    ] Error parsing json {:source=>"message", :raw=>"INFO |2023-07-20 00:04:46 |1 | manageQueue.py 335 | Scan details: {'main': {}, 'staging': {}}", :exception=>#<LogStash::Json::ParserError: Unrecognized token 'INFO': was expecting 'null', 'true', 'false' or NaN
 at [Source: (byte[])"INFO |2023-07-20 00:04:46 |1 | manageQueue.py 335 | Scan details: {'main': {}, 'staging': {}}"; line: 1, column: 6]>}
[2023-07-20T00:06:41,900][WARN ][logstash.filters.json    ] Error parsing json {:source=>"message", :raw=>"INFO |2023-07-20 00:05:09 |1 | manageQueue.py 324 | There is nothing to process on the queue as it is empty. Returning...", :exception=>#<LogStash::Json::ParserError: Unrecognized token 'INFO': was expecting 'null', 'true', 'false' or NaN
 at [Source: (byte[])"INFO |2023-07-20 00:05:09 |1 | manageQueue.py 324 | There is nothing to process on the queue as it is empty. Returning..."; line: 1, column: 6]>}
[2023-07-20T00:06:46,908][WARN ][logstash.filters.json    ] Error parsing json {:source=>"message", :raw=>"Thu Jul 20 00:06:38 2023 sent heartbeat", :exception=>#<LogStash::Json::ParserError: Unrecognized token 'Thu': was expecting ('true', 'false' or 'null')
 at [Source: (byte[])"Thu Jul 20 00:06:38 2023 sent heartbeat"; line: 1, column: 5]>}
[2023-07-20T00:06:52,384][WARN ][logstash.filters.json    ] Error parsing json {:source=>"message", :raw=>"Thu Jul 20 00:06:42 2023 sent heartbeat", :exception=>#<LogStash::Json::ParserError: Unrecognized token 'Thu': was expecting ('true', 'false' or 'null')
 at [Source: (byte[])"Thu Jul 20 00:06:42 2023 sent heartbeat"; line: 1, column: 5]>}
[2023-07-20T00:06:57,717][WARN ][logstash.filters.json    ] Error parsing json {:source=>"message", :raw=>"Thu Jul 20 00:06:45 2023 sent heartbeat", :exception=>#<LogStash::Json::ParserError: Unrecognized token 'Thu': was expecting ('true', 'false' or 'null')
 at [Source: (byte[])"Thu Jul 20 00:06:45 2023 sent heartbeat"; line: 1, column: 5]>}
[2023-07-20T00:07:14,423][WARN ][logstash.filters.json    ] Error parsing json {:source=>"message", :raw=>"Thu Jul 20 00:06:51 2023 sent heartbeat", :exception=>#<LogStash::Json::ParserError: Unrecognized token 'Thu': was expecting ('true', 'false' or 'null')
 at [Source: (byte[])"Thu Jul 20 00:06:51 2023 sent heartbeat"; line: 1, column: 5]>}
[2023-07-20T00:07:19,513][WARN ][logstash.filters.json    ] Error parsing json {:source=>"message", :raw=>"Thu Jul 20 00:06:52 2023 sent heartbeat", :exception=>#<LogStash::Json::ParserError: Unrecognized token 'Thu': was expecting ('true', 'false' or 'null')
 at [Source: (byte[])"Thu Jul 20 00:06:52 2023 sent heartbeat"; line: 1, column: 5]>}
[2023-07-20T00:07:25,422][WARN ][logstash.filters.json    ] Error parsing json {:source=>"message", :raw=>"Thu Jul 20 00:06:52 2023 sent heartbeat", :exception=>#<LogStash::Json::ParserError: Unrecognized token 'Thu': was expecting ('true', 'false' or 'null')
 at [Source: (byte[])"Thu Jul 20 00:06:52 2023 sent heartbeat"; line: 1, column: 5]>}
[2023-07-20T00:07:30,430][WARN ][logstash.filters.json    ] Error parsing json {:source=>"message", :raw=>"Thu Jul 20 00:06:53 2023 sent heartbeat", :exception=>#<LogStash::Json::ParserError: Unrecognized token 'Thu': was expecting ('true', 'false' or 'null')
 at [Source: (byte[])"Thu Jul 20 00:06:53 2023 sent heartbeat"; line: 1, column: 5]>}
[2023-07-20T00:07:41,453][WARN ][logstash.filters.json    ] Error parsing json {:source=>"message", :raw=>"Thu Jul 20 00:07:12 2023 sent heartbeat", :exception=>#<LogStash::Json::ParserError: Unrecognized token 'Thu': was expecting ('true', 'false' or 'null')
 at [Source: (byte[])"Thu Jul 20 00:07:12 2023 sent heartbeat"; line: 1, column: 5]>}
[2023-07-20T00:07:57,714][WARN ][logstash.filters.json    ] Error parsing json {:source=>"message", :raw=>"Thu Jul 20 00:07:33 2023 sent heartbeat", :exception=>#<LogStash::Json::ParserError: Unrecognized token 'Thu': was expecting ('true', 'false' or 'null')
 at [Source: (byte[])"Thu Jul 20 00:07:33 2023 sent heartbeat"; line: 1, column: 5]>}
[2023-07-20T00:08:03,714][WARN ][logstash.filters.json    ] Error parsing json {:source=>"message", :raw=>"Thu Jul 20 00:07:36 2023 sent heartbeat", :exception=>#<LogStash::Json::ParserError: Unrecognized token 'Thu': was expecting ('true', 'false' or 'null')
 at [Source: (byte[])"Thu Jul 20 00:07:36 2023 sent heartbeat"; line: 1, column: 5]>}

What I mean is that this is the message you shared in your first post.

{"asctime": "2023-07-26 09:05:08,271", "levelname": "INFO", "pathname": "/app/realtime_processor/parallel_processor.py", "lineno": 280, "message": "comment_body: **SonarQube Quality Gate**\n\n[![Quality Gate passed](https://... does not match defined phrase: rerun ingress-garuda-prod", "namespace": "challenger-rover", "version": "jfrog.artifactory:5060/drpbuilds/list-groovy-real:583eb6a", "pod": "ingress-garuda-prod-647c5d755-7jmwx"}

I assumed that this is what was being sent to Logstash.

But the message in your screenshot is completely different, you have a lot of more fields, and you have what would be multiple messages inside this original message field.

So it is not clear what is exactly being sent to Logstash.

It is messages like the first one you shared, or something like the one you shared in your screenshot where you have multiple lines starting with {"index", also you didn't say how are you sending it.

This makes total difference on the troubleshoot, for example if you are using a Contet-Type as application/json while sending to logstash, this would already parse your message.

Can you provide a sample in plain text of what you are exactly sending to logstash?

These are expected, the field message does not have a json, so the json filter will obviously not work.

@leandrojmp : Yes, the application logs are of below format and the same are being sent to logstash from fluentd.

fluentd config -

      <source>
        @type tail
        path /var/log/containers/*.log
        # exclude Fluentd logs
        exclude_path /var/log/containers/*fluentd*.log
        pos_file /opt/bitnami/fluentd/logs/buffers/fluentd-docker.pos
        tag kubernetes.*
        read_from_head true
        <parse>
          @type regexp
          expression ^(?<time>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}.[^Z]*Z)\s(?<stream>[^\s]+)\s(?<character>[^\s])\s(?<log>.*)$
          @time_format %Y-%m-%dT%H:%M:%S.%NZ
        </parse>
      </source>
	  
	 logstash.conf: |
      <match **>
        @type elasticsearch
        @id elasticsearch
        @log_level debug
        hosts ELKHOSTNAME_1:9200,ELKHOSTNAME_2:9200
        ssl_verify false
        verify_es_version_at_startup false
        default_elasticsearch_version 7
        request_timeout 100s
      </match>

application logs sample --

{"source": "nginx", "time": 1690544622.749, "resp_body_size": 58, "host": "business.org.dev.com", "address": "192.10.3.09", "request_length": 375, "method": "GET", "path": "/api/v1/", "status_code": 401,  "user_agent": "check_http/v2.3.3 (nagios-plugins 2.3.3)", "resp_time": 0.003, "upstream_addr": "unix:/tmp/gunicorn.sock"}

{"message": "Not Found: /api/v1/products/undefined/job_status/", "status_code": 404, "request": "<WSGIRequest: GET '/api/v1/products/undefined/job_status/'>", "request_id": "57de5b467fdc4cb6b4748c0497a271f7", "level": "WARNING", "lineno": 224, "filename": "validate.py", "created": 1690544635.1029072, "name": "django.request", "threadName": "MainThread", "module": "log", "timestamp": "2023-07-28T11:43:55.102995Z"}

{"message": "Exception with 401 response", "exc_info": "Traceback (most recent call last):\n  File \"/usr/local/lib/python3.8/site-packages/rest_framework/views.py\", line 497, in dispatch\n    self.initial(request, *args, **kwargs)\n  File \"/usr/local/lib/python3.8/site-packages/rest_framework/views.py\", line 415, in initial\n    self.check_permissions(request)\n  File \"/usr/local/lib/python3.8/site-packages/rest_framework/views.py\", line 333, in check_permissions\n    self.permission_denied(\n  File \"/usr/local/lib/python3.8/site-packages/rest_framework/views.py\", line 174, in permission_denied\n    raise exceptions.NotAuthenticated()\nrest_framework.exceptions.NotAuthenticated: Authentication credentials were not provided.", "exc_type": "NotAuthenticated", "exc_message": "Authentication credentials were not provided.", "exc_args": [], "exc_filename": "/usr/local/lib/python3.8/site-packages/rest_framework/views.py", "exc_module": "views.py", "exc_lineno": 174, "status_code": 401, "req_method": "GET", "req_path": "/api/v1/", "request_id": "e173fd9dde714660bf09bf1273b57f96", "level": "WARNING", "lineno": 55, "filename": "exceptions.py", "created": 1690544622.748727, "name": "business.core.exceptions", "threadName": "MainThread", "module": "exceptions", "timestamp": "2023-07-28T11:43:42.748939Z"}

{"message": "Role for", "role": ["read", "transfer"], "permitted": true, "request_id": "be64716819a53a89j8765tdbv4sd", "level": "INFO", "username": "roney cole", "lineno": 131, "filename": "validations.py", "created": 1690544615.5304928, "name": "business.core.validations", "threadName": "MainThread", "module": "validations", "timestamp": "2023-07-28T11:43:35.530529Z"}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.