Logstash failing with large pretty JSON files

(John Michael Burke) #1

I am getting a failure in my stack... Filebeats -> Logstash (Filter JSON) -> Elasticsearch.

My JSON files are valid as I have checked on jsonlint.com (I get 'Valid JSON'). My JSON files are roughly ~1300 lines. I have added to my filebeat.yml file: multiline.max_lines: 3000.

The error I get:
[2019-04-09T21:16:09,422][WARN ][logstash.filters.json ] Error parsing json { ... my json stuff here...
:exception=>#<LogStash::Json::ParserError: Unexpected end-of-input: expected close marker for Object (start marker at [Source: (byte)" ... a snippet of my json stuff here ... "targe"[truncated 27903 bytes]; line: 1, column: 1]).

I believe the issue might be that either filebeat or logstash is cutting off a portion of my JSON and then it becomes invalid... This occurs when I do not use a mutate filter and when I attempt to strip the JSON of all pretty fancy characters it still fails. I have used an online tool to remove the fancy chars and it ingests properly. Please help. Unfortunately I am unable to link any of the actual content of the JSON.

Here is my filebeat.yml:

filebeat.inputs:

  • type: log
    enabled: true
    paths:
    • /usr/share/filebeat/redactedterm/*.json
      multiline.pattern: '.'
      multiline.match: after
      multiline.max_lines: 3000
      output.logstash:
      hosts: ["logstash:5044"]

setup.kibana:
host: "kibana:5601"

setup.template.settings:
index.number_of_shards: 10

setup.dashboards.enabled: true

Here is my logstash.conf:

input {
beats {
port => 5044
}
}
filter {
mutate {
copy => { "message" => "payload" }
}
mutate {
gsub => [
"payload", " ", "",
"payload", "\n", "",
"payload", "\r", "",
"payload", "\t", ""
]
}
json {
source => "payload"

}
}

output {
elasticsearch {
hosts => "elasticsearch:9200"
manage_template => true
template => "/usr/share/logstash/logstash-template.json"
template_name => "logstash*"
template_overwrite => true
}
}

#2

Unless you have enabled config.support_escapes these do not do what you think they do (and could conceivably cause issues).

You appear to be running logstash on UNIX. You do not need to remove \t, \n or " ", the json filter will work around them. You will need to remove \r, but you need a literal Ctrl/M in the filter. Use Ctrl/V Ctrl/M to enter that.

If you decide you want to remove \n anyway, then use a literal newline in the configuration

mutate { gsub => [ "payload", "
", "" ] }

I realize you say it still chokes when you remove the mutate. I would expect beats and the lumberjack protocol to have no issue with a 30 KB message.

Can you replace your logstash output with a file output so that filebeat just copies the file from disk to disk and see if the output looks OK?

(John Michael Burke) #3

Hey sorry for the late response. Honestly if whitespace is not an issue for the filter it does not bother me to keep it included.

I did what you suggested and spit out the result to stdout{} instead of a file... and the message it spits out is my full json content. As long as I leave out the 'json' filter my logstash will actually shoot the result to elasticsearch; however, it just does not split the fields 'this.is.my.datapoint.split.nicely.with.periods' this way as the JSON filter has done for me in the past.

In the stdout it has a few extra fields for example 'tags' mentions...

"tags" => [
logstash | [0] "beats_input_codec_plain_applied",
logstash | [1] "_jsonparsefailure"
logstash | ],.

I have removed my gsub mutate and the failure comes from the json filter. So I'm wondering if I need to try something else?

Thank you for the help by the way!!

#4

What is first character of your json string? If it is [ then you cannot use a json filter without a target option. If it is { then you should be OK.

(John Michael Burke) #5

This is the start:

{
"MF": {},
...

What I find is very bizzare is that if I don't save my JSON as fancy or with extra whitespace characters... it is processed fine by the json filter.

I feel like file beats is doing fine but logstash's max lines argument might be at fault? However if you attempt to change logstashs max line in your input filebeats complains to do it there.

As for 'why dont I just save my logs without fancy json?' We just have numerous logs from before that are saved with it. I guess I could just write a script to convert them all. But at the end of the day it would just be great if logstash could handle this.

(John Michael Burke) #6

To help debug I have created a fake json file which somewhat matches the general skeleton of my JSON and it exhibits the same parser failure with the JSON filter.

exception=>#<LogStash::Json::ParserError: Unexpected end-of-input: expected close marker for Object (start marker at [Source: (byte)"{

Lemme know if this helps.

https://drive.google.com/file/d/1Ibgg6Mqhm6r-PghUsrlbLWk7qJhDXbee/view?usp=sharing

#7

When I ingest that fake json using filebeat the json filter parses it just fine. This is with 6.7.1

(John Michael Burke) #8

I switched my ELK_VERSION to 6.7.1 from 6.6.0 and verified logstash runs on 6.7.1 but the problem persists.... could I just confirm you are running with this logstash/filebeat params:

input {
beats {
port => 5044
}
}
filter {
json {
source => "message"

}
}

output {
elasticsearch {
hosts => "elasticsearch:9200"
manage_template => true
template => "/usr/share/logstash/logstash-template.json"
template_name => "logstash*"
template_overwrite => true
}
stdout {}
}

filebeat.inputs:

  • type: log
    enabled: true
    paths:
    • /usr/share/filebeat/where_you_put_the_json/*.json
      multiline.pattern: '.'
      multiline.match: after
      multiline.max_lines: 30000
      multiline.max_bytes: 1000 MiB
      output.logstash:
      hosts: ["logstash:5044"]

setup.kibana:
host: "kibana:5601"

setup.template.settings:
index.number_of_shards: 10

setup.dashboards.enabled: true

#9

I'm not using an elasticsearch output, but the failure occurs before the event reaches the output so that should make no difference. The rest looks the same.

(John Michael Burke) #10

Well I figured out my issue and I'm sure it is common knowledge but definitely worthwhile to know.

All of my Json logs are created using the python json library which does not add a newline to the end of the json file.

I found that regardless of using filebeats multiline setting or not, the logstash json filter always cuts off the last line of my json. So once I began adding a newline the problem miraculously disappeared. I don't know if this is a standard for json files of some sort but if it is not I feel like this might be worth fixing this in logstash.

Cheers and thank you for the assistance Badger!

(system) closed #11

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.