Filebeat disk queue messing up encoding

Hello,

I believe I have found a misbehavior in the disk queue mechanism in the filebeat.

When I’m adding a UTF-8 encoded json with a polish characters, on the output I am getting messed up value. Example:

^^^^^^^^^^^^^^^^^^^^^
IN: "||OUT1|OUT2|* niesłuszne lub błędne obciążenie ... podwójne obciążenie ... całości lub części środków ...TABELI PONIŻEJ i ZATWIERDŹ.|"

OUT: "||OUT1|OUT2|* niesususzne lub będndne obciżenenie ... podwjnjne obciżenenie ... cao\ufffdocici lub czścici rorodkw w ...TABELI PONIEJEJ i ZATWIERD.|.|"
^^^^^^^^^^^^^^^^^^^^^^

This is happening only when the disk queue mechanism is enabled. When its off everything working as expected. Also the problem is only affecting fields that contain inner encoded json (field “executionAdditionalInfo” in the example below). For fields with just text its working fine (e.g field “modUserFullName”).

  • Version: 7.10.1
  • Operating System: Windows 10 Enterprise (but also Windows Server)
  • Steps to Reproduce:
  1. Download filebeat version 7.10.1 (for windows)

  2. Create folders named “input” and “output”.

  3. Replace filebeat.yml with:
    https://gist.github.com/eValker/f5ae0d046d7b6807f2ea5a4cab525dac#file-filebeat-yml

  4. Start filebeat.

  5. Create file input/test.log with content:
    https://gist.github.com/eValker/f5ae0d046d7b6807f2ea5a4cab525dac#file-test-log
    (remember to leave empty line at the end of the file)

  6. Check the output file: “output/filebeat_out”. Everything is working as expected (both "modUserFullName" and "executionAdditionalInfo" are ok - same as inout).

  7. In the filebeat.yml uncomment lines 35-38 (enable queue.disk). Then restart the filebeat service.

  8. Copy down the first line of the input file (so the service will re-read it). Remember about leaving one empty line :slight_smile:

  9. Check the output. Characters in the field “executionAdditionalInfo” are messed up.

10 (optional): You can revert step 7 (disable queue.disk again) then restart service again. Everything is working ok again :slight_smile:

I did not found an option to change encoding in the queue disk mechanism in the docs: https://www.elastic.co/guide/en/beats/filebeat/current/configuring-internal-queue.html#configuration-internal-queue-disk

If it is my fault please help :slight_smile:

PS. I am using "@" character at the beginning of the "executionAdditionalInfo" field on the purpose. I do not want this field to be parsed as json.