Varying amount of JSON parse errors when parsing the same files


#1

I have a very simple setup:

input
{
 file
    {
	path => "/home/elasticadmin/azure/**/*.*"
	codec => "json"
	close_older => 5
    }
}
filter
{}

output {
  elasticsearch {
    user => "user"
    password => "pwd"
    index => "name"
    hosts => ["localhost:9200"] 
  }
}

I have a script that downloads files from Azure using Logstash to send them to ES. In the end there will be 90 000+ files which is the reason I have close_older as an option.

The problem I have is that I get a lot of error messages claiming JSON parse error. In the example I run, I have 8600 JSON objects in 1000 files. The number of documents that gets indexed in ES varies from 6500 - 8500. I am quite confident that the files contains properly formatted JSON objects. If there really was a problem with the formatting I would assume that I got the same number of errors whenever I ran a test with my sample files but I get different results every time.

Does anyone have a clue what could cause this?
Running ES/LS 6.3 with x-pack

Best regards,
Mattias


(Christian Dahlqvist) #2

Can you share a file that gives different results from run to run? Are each JSON object in a file on a single line?


#3

I will see if I can create a file that behaves in that way. The JSON objects are not on a single line, they are new line delimited.


(Christian Dahlqvist) #4

If they are not in a single line I believe you will need to use a multiline codec to make sure the full JSON object is in a single event.


#5

Ok, how would the JSON objects be separated if they are in a single line? " ", "," or "\t", what would be a multiline codec to use?


(Christian Dahlqvist) #6

Each JSON object should be on a separate line.


#7

Thanks for taking time to answer my question! I think we may have misunderstood each other.

Just to make sure I understand you correctly here is a more detailed explanation:
The files look like this:

{JSON}\n (1)
{JSON}\n (2)
{JSON} (3)

I want to index this as three different documents in Elastic.

  1. Do I need a multiline codec for this? (it is not one object on multiple rows)
  2. If not, what could be potential errors that cause Parse errors?

Best regards,
Mattias


(Christian Dahlqvist) #8

If each object is on a single line and followed by a newline you will not need to use the multiline codec.

I would recommend looking for parse failures in your config and write these to a file. Without knowing what they look like it is hard to speculate about what could be wrong.


(system) #9

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.