Varying amount of JSON parse errors when parsing the same files

I have a very simple setup:

input
{
 file
    {
	path => "/home/elasticadmin/azure/**/*.*"
	codec => "json"
	close_older => 5
    }
}
filter
{}

output {
  elasticsearch {
    user => "user"
    password => "pwd"
    index => "name"
    hosts => ["localhost:9200"] 
  }
}

I have a script that downloads files from Azure using Logstash to send them to ES. In the end there will be 90 000+ files which is the reason I have close_older as an option.

The problem I have is that I get a lot of error messages claiming JSON parse error. In the example I run, I have 8600 JSON objects in 1000 files. The number of documents that gets indexed in ES varies from 6500 - 8500. I am quite confident that the files contains properly formatted JSON objects. If there really was a problem with the formatting I would assume that I got the same number of errors whenever I ran a test with my sample files but I get different results every time.

Does anyone have a clue what could cause this?
Running ES/LS 6.3 with x-pack

Best regards,
Mattias

Can you share a file that gives different results from run to run? Are each JSON object in a file on a single line?

I will see if I can create a file that behaves in that way. The JSON objects are not on a single line, they are new line delimited.

If they are not in a single line I believe you will need to use a multiline codec to make sure the full JSON object is in a single event.

Ok, how would the JSON objects be separated if they are in a single line? " ", "," or "\t", what would be a multiline codec to use?

Each JSON object should be on a separate line.

Thanks for taking time to answer my question! I think we may have misunderstood each other.

Just to make sure I understand you correctly here is a more detailed explanation:
The files look like this:

{JSON}\n (1)
{JSON}\n (2)
{JSON} (3)

I want to index this as three different documents in Elastic.

  1. Do I need a multiline codec for this? (it is not one object on multiple rows)
  2. If not, what could be potential errors that cause Parse errors?

Best regards,
Mattias

If each object is on a single line and followed by a newline you will not need to use the multiline codec.

I would recommend looking for parse failures in your config and write these to a file. Without knowing what they look like it is hard to speculate about what could be wrong.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.