Hi,
I've got a multiline log file that has following structure
2021-12-06 11:36:31,088 ....
2021-12-06 11:36:32,588 ...
2021-12-06 11:36:34,127 ---------------------------
ID: 82506
Response-Code: 200
Content-Type: application/pdf
Headers: {Content-Type=[application/pdf], Date=[Mon, 06 Dec 2021 10:36:34 GMT]}
Payload: %PDF-1.5
%����
2 0 obj<</Filter/FlateDecode/Length 16681>>stream
x^A��Ms^]Ǖ���^Uw5*D^Xнu?�w�,��A۲͎�^Y�^W^P^HʰI�^F@�{~�,z1���^C�yޓY^W$^E�&bb�^Q^DNfV~��<�U�ۓ�x�ݏ���xqﺡ��qu��]�W��O���Ŵ�v�^X��j3�^W�#M6�ŭ�z���O~���/�m�z�� ��V��z��1ȳ�_^O���_��_�ߜm7^W���wg���~X=�^=?ۮ/�û7g���8��~��Ң�p���Wgljv����ٯ�|������!���w���ϸO�����Wo�v�������-+^Y��V/o���ݭ�nΟ����p}w��9^Nw+��^�������ՕS>���ͻ�qO���^^�����4�q�,^S^_���O��~�=��^M�]��3Y>�_�Iܞ�7L�"^K���� ?^]�~ry���4^]^G����?;�40iXv^X�o�n
2021-12-06 11:37:34,127 ...
2021-12-06 11:37:34,127 ...
Config:
-
type: log
fields_under_root: true
paths:
- /mydirtylogfile
multiline:
pattern: '^[0-9]{4}-[0-9]{2}-[0-9]{2}'
negate: true
match: after
This works perfectly.
However there is a problem that in the lines that do not start with a date there is the possibility that it dumps binary lines (see example). In the past this went trough (using filebeat -> logstash -> elastic)
I think the frequency that this occurs has risen. Now I see that after a while no log lines are being processed. Lines are indexes 2 hours later then what was in the logline. (timestamp logline vs @timestamp logstash). So it looks like filebeat is having issues.
Is there a way to exclude "binairy lines" form being processed by filebeat. I've talk to develepment to see if the can simply not log the data but that is "impossible"
Anyone got any pointers for this?