Filebeat encodes harvested content before shipping

Hi guys,

first the problem
I´m running filebeat on an apache webserver and want to ship the access logs to logstash.
I ran into problems with the content of shipment and already narrowed it down to filebeat:
The output is file atm and I sometimes see those lines in file:

{"@timestamp":"2018-10-25T10:30:25.943Z","@metadata":{"beat":"filebeat","type":"doc","version":"6.4.2"},"prospector":{"type":"log"},"input":{"type":"log"},"beat":{"name":"liintra311","hostname":"liintra311","version":"6.4.2"},"host":{"name":"liintra311"},"source":"/www/vofapl-int-401-ui/logs/apache/liintra312.access_log","offset":4832199,"message":"\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000141.77.10.108 vofapl-int-401.bmwgroup.net qxv1658 [25/Oct/2018:12:30:21 +0200] "GET /vofapl_bc/api/v1/program_scenario/22695/product_type/1/placement/scenariostate HTTP/1.1" 200 80 "https://vofapl-int-401.bmwgroup.net/vofapl_ui/?conversationid=S3gGR7VQosPrBtebKHdl\u0026env=workplace\u0026lang=de\u0026mwpOrigin=https%3A%2F%2Fworkplace-int2.bmwgroup.net\u0026strongAuth=1\" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.67 Safari/537.36"","type":"apache-access-logs"}

As I said this already happens on filebeat side, so I won´t post logstash or other config here.
This is my filebeat config

./filebeat.yml

filebeat.config.modules:
enabled: true
path: ${path.config}/modules.d/*.yml

filebeat.prospectors:

filebeat.config.inputs:
enabled: true
path: conf.d/*.yml
reload.enabled: true
reload.period: 10s

#output.logstash:

hosts: ["10.248.114.155:5044"]

bulk_max_size: 5120

output.file:
path: "/lfs/elastic/filebeat-6.4.2-linux-x86_64"
filename: filebeat-output
rotate_every_kb: 10000
number_of_files: 7
permissions: 0600
bulk_max_size: 0

and this is the prospector config (./conf.d/apache-access-logs.yml

  • type: log
    paths:

    • /www/vofapl-int-/logs/**/liintra31.access_log

    #encoding: utf-16

    multiline.pattern: '^\b\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}\b\s'
    multiline.negate: false
    multiline.match: after

    fields:
    type: apache-access-logs

    fields_under_root: true

I have no clue whats going on and hope somebody can help me here :frowning:
Found some topics about similar problems, but they gave my finally no helpful hints about that :frowning:

Thanks guys!

This look weird at first, but I think your multiline pattern might be wrong,

The following look like the end of a message:

\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000

And this look like the beginning:

[25/Oct/2018:12:30:21 +0200] "GET /vofapl_bc/api/v1/program_scenario/22695/product_type/1/placement/scenariostate HTTP/1.1" 200 80 

You´re right, that are 2 different logs, but the strange thing is, where theses /u0000 chars are coming from anyway ....

tracked one happening exactly and this seems to narrow down things immensely, but I don´t know the solution

This is the filebeat output (snippet)

/?conversationid=yorat4kDWJnec6hw8vXf\u0026env=w

This is where it comes from (source)

/?conversationid=yorat4kDWJnec6hw8vXf&env=w

Filebeat seems to encode the input, but i don´t know why ?

the logfile has these details

file -bi /www/vofapl-int-401-ui/logs/apache/liintra312.access_log

text/plain charset=us-ascii

Nobody .... ? to me it seems to be a configuration issue, but I wasn´t able to effect any change with configuration changes.....
Sure, I can control how the harvesters work in respect of opening and closing files, I can also change the encoding generally, but nothing changed that behaviour which I mentioned so far :frowning:
It´s pretty frustrating :frowning:

OMG, interesting stuff....
I configured the codec for my fileoutput like that now

output.file:
.....
codec.format:
string: '%{[@timestamp]} %{[message]}'

And I don´t see any of these "\uxxx" encodings anymore ....
Means it happens during json transformation ?

Really nobody ... ? Somebody knows at least how I could dive deeper into this ?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.