Exclude binary data filebeat in logfile


I've got a multiline log file that has following structure

2021-12-06 11:36:31,088  ....
2021-12-06 11:36:32,588 ...
2021-12-06 11:36:34,127 ---------------------------
ID: 82506
Response-Code: 200
Content-Type: application/pdf
Headers: {Content-Type=[application/pdf], Date=[Mon, 06 Dec 2021 10:36:34 GMT]}
Payload: %PDF-1.5
2 0 obj<</Filter/FlateDecode/Length 16681>>stream
x^A��Ms^]Ǖ���^Uw5*D^Xнu?�w�,��A۲͎�^Y�^W^P^HʰI�^F@�{~�,z1���^C�yޓY^W$^E�&bb�^Q^DNfV~��<�U�ۓ�x�ݏ���xqﺡ��qu��]�W��O���Ŵ�v�^X��j3�^W�#M6�ŭ�z���O~���/�m�z�� ��V��z��1ȳ�_^O���_��_�ߜm7^W���wg���~X=�^=?ۮ/�û7g���8��~��Ң�p���Wgljv����ٯ�|������!���w���ϸO��׻���Wo�v�������-+^Y��V/o���ݭ�nΟ����p}w��9^Nw+��^�������ՕS>���ͻ�qO���^^�����4�q�,^S^_���O��~�=��^M�]��3Y>�_�Iܞ�7L�"^K����    ?^]�~ry���4^]^G����?;�40iXv^X�o�n
2021-12-06 11:37:34,127 ...
2021-12-06 11:37:34,127 ...


      type: log
      fields_under_root: true
        - /mydirtylogfile
        pattern: '^[0-9]{4}-[0-9]{2}-[0-9]{2}'
        negate: true
        match: after

This works perfectly.

However there is a problem that in the lines that do not start with a date there is the possibility that it dumps binary lines (see example). In the past this went trough (using filebeat -> logstash -> elastic)

I think the frequency that this occurs has risen. Now I see that after a while no log lines are being processed. Lines are indexes 2 hours later then what was in the logline. (timestamp logline vs @timestamp logstash). So it looks like filebeat is having issues.

Is there a way to exclude "binairy lines" form being processed by filebeat. I've talk to develepment to see if the can simply not log the data but that is "impossible"

Anyone got any pointers for this?

Have you tried to use processors? Filter and enhance data with processors | Filebeat Reference [7.16] | Elastic

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.