Varying amount of JSON parse errors when parsing the same files

Matjo · June 28, 2018, 9:56pm

I have a very simple setup:

input
{
 file
    {
	path => "/home/elasticadmin/azure/**/*.*"
	codec => "json"
	close_older => 5
    }
}
filter
{}

output {
  elasticsearch {
    user => "user"
    password => "pwd"
    index => "name"
    hosts => ["localhost:9200"] 
  }
}

I have a script that downloads files from Azure using Logstash to send them to ES. In the end there will be 90 000+ files which is the reason I have close_older as an option.

The problem I have is that I get a lot of error messages claiming JSON parse error. In the example I run, I have 8600 JSON objects in 1000 files. The number of documents that gets indexed in ES varies from 6500 - 8500. I am quite confident that the files contains properly formatted JSON objects. If there really was a problem with the formatting I would assume that I got the same number of errors whenever I ran a test with my sample files but I get different results every time.

Does anyone have a clue what could cause this?
Running ES/LS 6.3 with x-pack

Best regards,
Mattias

Christian_Dahlqvist · June 29, 2018, 6:45am

Can you share a file that gives different results from run to run? Are each JSON object in a file on a single line?

Matjo · June 29, 2018, 7:37am

I will see if I can create a file that behaves in that way. The JSON objects are not on a single line, they are new line delimited.

Christian_Dahlqvist · June 29, 2018, 7:39am

If they are not in a single line I believe you will need to use a multiline codec to make sure the full JSON object is in a single event.

Matjo · June 29, 2018, 7:46am

Ok, how would the JSON objects be separated if they are in a single line? " ", "," or "\t", what would be a multiline codec to use?

Christian_Dahlqvist · June 29, 2018, 7:49am

Each JSON object should be on a separate line.

Matjo · June 29, 2018, 1:55pm

Thanks for taking time to answer my question! I think we may have misunderstood each other.

Just to make sure I understand you correctly here is a more detailed explanation:
The files look like this:

{JSON}\n (1)
{JSON}\n (2)
{JSON} (3)

I want to index this as three different documents in Elastic.

Do I need a multiline codec for this? (it is not one object on multiple rows)
If not, what could be potential errors that cause Parse errors?

Best regards,
Mattias

Christian_Dahlqvist · June 29, 2018, 2:07pm

If each object is on a single line and followed by a newline you will not need to use the multiline codec.

I would recommend looking for parse failures in your config and write these to a file. Without knowing what they look like it is hard to speculate about what could be wrong.

system · July 27, 2018, 2:07pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to send multiline json file and send it to elasticsearch Logstash	2	123	February 18, 2024
JSON Parsing Error - Logstash Logstash	2	314	January 2, 2020
Trouble with Logstash JSON parsing Logstash	10	2696	December 20, 2017
Nested Json parse failure in logstash? Logstash	9	1407	September 12, 2019
Prettified json is not parsed by logstash s3 input plugin Logstash	5	628	June 16, 2021

Varying amount of JSON parse errors when parsing the same files

Related topics