Error while parsing nested json on filebeat

I have the following error on logstash when trying to index json nested logs that are inside square brackets ''

Error parsing json {:source=>"message", :raw=>"]", :exception=>#<LogStash::Json::ParserError: Unexpected close marker ']': expected '}' (for root starting at [Source: (byte[])"]"; line: 1, column: 0])

and here's the log example:

[{"Name":"xxxx","Title":"Teste22","Domain":"xxxxcom","BreachDate":"2012-05-05","AddedDate":"2016-05-21T21:35:40Z","ModifiedDate":"2016-05-21T21:35:40Z","PwnCount":164611595,"Description":"In May 2016, <a href=\"https://www.xxxxx.com/oxxxxx-and-thoughts-on-txxxxxxxx-data-breach\" target=\"_blank\" rel=\"noopener\">xxxx had 164 million email addresses and passwords exposed</a>. Originally hacked in 2012, the data remained out of sight until being offered for sale on a dark market site 4 years later. The passwords in the breach were stored as SHA1 hashes without salt, the vast majority of which were quickly cracked in the days following the release of the data.","LogoPath":"https://xxxxxxxxned.com/Content/Images/xxxogos/LinkedIn.png","DataClasses":["Email addresses","Passwords"],"IsVerified":true,"IsFabricated":false,"IsSensitive":false,"IsRetired":false,"IsSpamList":false,"IsMalware":false},{"Name":"xxxxxxcrape","Title":"xxxxxScraped Data","Domain":"xxxxx.com","BreachDate":"2021-04-08","AddedDate":"2021-10-02T21:39:21Z","ModifiedDate":"2021-10-02T21:48:03Z","PwnCount":125698496,"Description":"During the first half of 2021, <a href=\"https://www.xxxxxx.com.au/lxxxxx-data-sxxxxxx-million-users-for-sale-online-2021-4\" target=\"_blank\" rel=\"noopener\">xwas tarxxxxx geted by attackers who scraped data from hundreds of millions of public profiles and later sold them online</a>. Whilst the scraping did not constitute a data breach nor did it access any personal data not intended to be publicly accessible, the data was still monetised and later broadly circulated in hacking circles. The scraped data contains approximately 400M records with 125M unique email addresses, as well as names, geographic locations, genders and job titles. LinkedIn specifically addresses the incident in their post on <a href=\"https://news.linkedin.com/2021/june/an-update-from-xxxxx\" target=\"_blank\" rel=\"noopener\">An update on report of scraped data</a>.","LogoPath":"https://xxxxxx.com/Content/Images/PxxxLogos/Lxxx.png","DataClasses":["Education levels","Email addresses","Genders","Geographic locations","Job titles","Names","Social media profiles"],"IsVerified":true,"IsFabricated":false,"IsSensitive":false,"IsRetired":false,"IsSpamList":false,"IsMalware":false}]

after that i tried to use multiline but no success.
I'm using filebeat to read the log file and send it to logstash.

here's a example of my filebeat.yml:

- type: log
  enabled: true
  paths:
    - /zzz/aaa/bbb/cccc/ddd/eee/*.json
  #json.keys_under_root: true
  #json.add_error_key: true
  multiline.pattern: \{.*\}
  multiline.negate: false
  multiline.match: before
  multiline.max_lines: 50000
  multiline.timeout: 10
  fields_under_root: true
  fields:
   fonte: "dom"
  processors:
   - decode_json_fields:
       fields: ["message"]
       target: ''
       process_array: true
       max_depth: 8

and the logstash.conf:

input {
  beats {
    port => 6666
  }
}

filter{
}

output {
if [fonte] == "dom"{
    elasticsearch { hosts => ["https://xx.xx.xx0.1xxx:9200"]
   user => "xxx"
   password => "xxxx"
   ssl_certificate_verification => false
   index => "xxxxx-%{+YYYY.MM}"
   }
 }

I tried to use the multiline method in filebeat and also in logstash but I couldn't find a pattern or configuration that worked, can anyone help me with this?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.