I have the following error on logstash when trying to index json nested logs that are inside square brackets ''
Error parsing json {:source=>"message", :raw=>"]", :exception=>#<LogStash::Json::ParserError: Unexpected close marker ']': expected '}' (for root starting at [Source: (byte[])"]"; line: 1, column: 0])
and here's the log example:
[{"Name":"xxxx","Title":"Teste22","Domain":"xxxxcom","BreachDate":"2012-05-05","AddedDate":"2016-05-21T21:35:40Z","ModifiedDate":"2016-05-21T21:35:40Z","PwnCount":164611595,"Description":"In May 2016, <a href=\"https://www.xxxxx.com/oxxxxx-and-thoughts-on-txxxxxxxx-data-breach\" target=\"_blank\" rel=\"noopener\">xxxx had 164 million email addresses and passwords exposed</a>. Originally hacked in 2012, the data remained out of sight until being offered for sale on a dark market site 4 years later. The passwords in the breach were stored as SHA1 hashes without salt, the vast majority of which were quickly cracked in the days following the release of the data.","LogoPath":"https://xxxxxxxxned.com/Content/Images/xxxogos/LinkedIn.png","DataClasses":["Email addresses","Passwords"],"IsVerified":true,"IsFabricated":false,"IsSensitive":false,"IsRetired":false,"IsSpamList":false,"IsMalware":false},{"Name":"xxxxxxcrape","Title":"xxxxxScraped Data","Domain":"xxxxx.com","BreachDate":"2021-04-08","AddedDate":"2021-10-02T21:39:21Z","ModifiedDate":"2021-10-02T21:48:03Z","PwnCount":125698496,"Description":"During the first half of 2021, <a href=\"https://www.xxxxxx.com.au/lxxxxx-data-sxxxxxx-million-users-for-sale-online-2021-4\" target=\"_blank\" rel=\"noopener\">xwas tarxxxxx geted by attackers who scraped data from hundreds of millions of public profiles and later sold them online</a>. Whilst the scraping did not constitute a data breach nor did it access any personal data not intended to be publicly accessible, the data was still monetised and later broadly circulated in hacking circles. The scraped data contains approximately 400M records with 125M unique email addresses, as well as names, geographic locations, genders and job titles. LinkedIn specifically addresses the incident in their post on <a href=\"https://news.linkedin.com/2021/june/an-update-from-xxxxx\" target=\"_blank\" rel=\"noopener\">An update on report of scraped data</a>.","LogoPath":"https://xxxxxx.com/Content/Images/PxxxLogos/Lxxx.png","DataClasses":["Education levels","Email addresses","Genders","Geographic locations","Job titles","Names","Social media profiles"],"IsVerified":true,"IsFabricated":false,"IsSensitive":false,"IsRetired":false,"IsSpamList":false,"IsMalware":false}]
after that i tried to use multiline but no success.
I'm using filebeat to read the log file and send it to logstash.
here's a example of my filebeat.yml:
- type: log
enabled: true
paths:
- /zzz/aaa/bbb/cccc/ddd/eee/*.json
#json.keys_under_root: true
#json.add_error_key: true
multiline.pattern: \{.*\}
multiline.negate: false
multiline.match: before
multiline.max_lines: 50000
multiline.timeout: 10
fields_under_root: true
fields:
fonte: "dom"
processors:
- decode_json_fields:
fields: ["message"]
target: ''
process_array: true
max_depth: 8
and the logstash.conf:
input {
beats {
port => 6666
}
}
filter{
}
output {
if [fonte] == "dom"{
elasticsearch { hosts => ["https://xx.xx.xx0.1xxx:9200"]
user => "xxx"
password => "xxxx"
ssl_certificate_verification => false
index => "xxxxx-%{+YYYY.MM}"
}
}
I tried to use the multiline method in filebeat and also in logstash but I couldn't find a pattern or configuration that worked, can anyone help me with this?