Logstash failing with large pretty JSON files

John_Michael_Burke · April 9, 2019, 9:37pm

I am getting a failure in my stack... Filebeats -> Logstash (Filter JSON) -> Elasticsearch.

My JSON files are valid as I have checked on jsonlint.com (I get 'Valid JSON'). My JSON files are roughly ~1300 lines. I have added to my filebeat.yml file: multiline.max_lines: 3000.

The error I get:
[2019-04-09T21:16:09,422][WARN ][logstash.filters.json ] Error parsing json { ... my json stuff here...
:exception=>#<LogStash::Json::ParserError: Unexpected end-of-input: expected close marker for Object (start marker at [Source: (byte)" ... a snippet of my json stuff here ... "targe"[truncated 27903 bytes]; line: 1, column: 1]).

I believe the issue might be that either filebeat or logstash is cutting off a portion of my JSON and then it becomes invalid... This occurs when I do not use a mutate filter and when I attempt to strip the JSON of all pretty fancy characters it still fails. I have used an online tool to remove the fancy chars and it ingests properly. Please help. Unfortunately I am unable to link any of the actual content of the JSON.

Here is my filebeat.yml:

filebeat.inputs:

type: log
enabled: true
paths:

/usr/share/filebeat/redactedterm/*.json
multiline.pattern: '.'
multiline.match: after
multiline.max_lines: 3000
output.logstash:
hosts: ["logstash:5044"]

setup.kibana:
host: "kibana:5601"

setup.template.settings:
index.number_of_shards: 10

setup.dashboards.enabled: true

Here is my logstash.conf:

input {
beats {
port => 5044
}
}
filter {
mutate {
copy => { "message" => "payload" }
}
mutate {
gsub => [
"payload", " ", "",
"payload", "\n", "",
"payload", "\r", "",
"payload", "\t", ""
]
}
json {
source => "payload"

}
}

output {
elasticsearch {
hosts => "elasticsearch:9200"
manage_template => true
template => "/usr/share/logstash/logstash-template.json"
template_name => "logstash*"
template_overwrite => true
}
}

Badger · April 9, 2019, 10:40pm

Unless you have enabled config.support_escapes these do not do what you think they do (and could conceivably cause issues).

You appear to be running logstash on UNIX. You do not need to remove \t, \n or " ", the json filter will work around them. You will need to remove \r, but you need a literal Ctrl/M in the filter. Use Ctrl/V Ctrl/M to enter that.

If you decide you want to remove \n anyway, then use a literal newline in the configuration

mutate { gsub => [ "payload", "
", "" ] }

I realize you say it still chokes when you remove the mutate. I would expect beats and the lumberjack protocol to have no issue with a 30 KB message.

Can you replace your logstash output with a file output so that filebeat just copies the file from disk to disk and see if the output looks OK?

John_Michael_Burke · April 10, 2019, 10:54pm

Hey sorry for the late response. Honestly if whitespace is not an issue for the filter it does not bother me to keep it included.

I did what you suggested and spit out the result to stdout{} instead of a file... and the message it spits out is my full json content. As long as I leave out the 'json' filter my logstash will actually shoot the result to elasticsearch; however, it just does not split the fields 'this.is.my.datapoint.split.nicely.with.periods' this way as the JSON filter has done for me in the past.

In the stdout it has a few extra fields for example 'tags' mentions...

"tags" => [
logstash | [0] "beats_input_codec_plain_applied",
logstash | [1] "_jsonparsefailure"
logstash | ],.

I have removed my gsub mutate and the failure comes from the json filter. So I'm wondering if I need to try something else?

Thank you for the help by the way!!

Badger · April 10, 2019, 11:59pm

What is first character of your json string? If it is [ then you cannot use a json filter without a target option. If it is { then you should be OK.

John_Michael_Burke · April 11, 2019, 3:40pm

This is the start:

{
"MF": {},
...

What I find is very bizzare is that if I don't save my JSON as fancy or with extra whitespace characters... it is processed fine by the json filter.

I feel like file beats is doing fine but logstash's max lines argument might be at fault? However if you attempt to change logstashs max line in your input filebeats complains to do it there.

As for 'why dont I just save my logs without fancy json?' We just have numerous logs from before that are saved with it. I guess I could just write a script to convert them all. But at the end of the day it would just be great if logstash could handle this.

John_Michael_Burke · April 11, 2019, 4:54pm

To help debug I have created a fake json file which somewhat matches the general skeleton of my JSON and it exhibits the same parser failure with the JSON filter.

exception=>#<LogStash::Json::ParserError: Unexpected end-of-input: expected close marker for Object (start marker at [Source: (byte)"{

Lemme know if this helps.

https://drive.google.com/file/d/1Ibgg6Mqhm6r-PghUsrlbLWk7qJhDXbee/view?usp=sharing

Badger · April 11, 2019, 5:24pm

When I ingest that fake json using filebeat the json filter parses it just fine. This is with 6.7.1

John_Michael_Burke · April 11, 2019, 7:47pm

I switched my ELK_VERSION to 6.7.1 from 6.6.0 and verified logstash runs on 6.7.1 but the problem persists.... could I just confirm you are running with this logstash/filebeat params:

input {
beats {
port => 5044
}
}
filter {
json {
source => "message"

}
}

output {
elasticsearch {
hosts => "elasticsearch:9200"
manage_template => true
template => "/usr/share/logstash/logstash-template.json"
template_name => "logstash*"
template_overwrite => true
}
stdout {}
}

filebeat.inputs:

type: log
enabled: true
paths:

/usr/share/filebeat/where_you_put_the_json/*.json
multiline.pattern: '.'
multiline.match: after
multiline.max_lines: 30000
multiline.max_bytes: 1000 MiB
output.logstash:
hosts: ["logstash:5044"]

setup.kibana:
host: "kibana:5601"

setup.template.settings:
index.number_of_shards: 10

setup.dashboards.enabled: true

Badger · April 11, 2019, 8:06pm

I'm not using an elasticsearch output, but the failure occurs before the event reaches the output so that should make no difference. The rest looks the same.

John_Michael_Burke · April 19, 2019, 4:28pm

Well I figured out my issue and I'm sure it is common knowledge but definitely worthwhile to know.

All of my Json logs are created using the python json library which does not add a newline to the end of the json file.

I found that regardless of using filebeats multiline setting or not, the logstash json filter always cuts off the last line of my json. So once I began adding a newline the problem miraculously disappeared. I don't know if this is a standard for json files of some sort but if it is not I feel like this might be worth fixing this in logstash.

Cheers and thank you for the assistance Badger!

system · May 17, 2019, 4:28pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Parsing Json File with Logstash and Filebeat Logstash	4	1854	October 9, 2017
Filebeat truncates output data Beats filebeat	8	1996	January 2, 2020
Logstash json filter error Logstash	4	903	October 18, 2018
Failed when parsing JSON from VirusTotal Logstash	2	743	October 22, 2018
Trouble with Logstash JSON parsing Logstash	10	2696	December 20, 2017

Logstash failing with large pretty JSON files

Related topics