Hi,
Windows often prepend UTF-8 BOM to text files, which is legal - see Can a UTF-8 data stream contain the BOM character (in UTF-8 form)?. In my case it happens in Exchange 2010 Message Tracking logs.
Filebeat must strip UTF-8 BOM from files with UTF-8 encoding which it doesn't do and BOM appears in message field for the first line in the file:
{"@timestamp":"2016-04-07T11:58:36.922Z","beat":{"hostname":"XXXXX","name":"XXXXX"},"count":1,"fields":null,"input_type":"log","message":"<U+FEFF>#Software: Microsoft Exchange Server","offset":0,"source":"exchange/MSGTRK20160405-1.LOG","type":"exchange"}
Filebeat config:
filebeat:
prospectors:
-
document_type: exchange
input_type: log
paths:
- exchange/MSGTRK2*.LOG
encoding: utf-8
output:
file:
path: logstash/output
name: exchange
Hex dump of first line in file:
00000000 ef bb bf 23 53 6f 66 74 77 61 72 65 3a 20 4d 69 |...#Software: Mi|
00000010 63 72 6f 73 6f 66 74 20 45 78 63 68 61 6e 67 65 |crosoft Exchange|
00000020 20 53 65 72 76 65 72 0d 0a | Server..|
First three bytes EF BB BF (UTF-8 encoded BOM) are decoded to unicode character FE FF (which is used as BOM for UTF-16 encoding) and it appears in message field.
UTF-8 BOM could be used to detect UTF-8 encoding.