Filebeat truncates output data

I'm using Filebeat 7.4.2 in order to read files from a directory. Each file contains valid json data line by line. Each json row within the file is read from filebeat then sent to a logstash (v7.4.2) instance that will send data to Elasticsearch (v7.4.2).

It comes that logstash prints every how an error log like the following:

[2019-12-05T09:05:16,378][WARN][logstash.filters.json][main] Error parsing json {:source=>"message", :raw=>"/www.super-bike.ch/it/caschi-modulari/21111-165734-ls2-ff313-vortex-solid-matt-carbon.html#/2-taglia-s?SubmitCurrency=1&id_currency=1\",\"image_path_locale\":\"wasabi_dragon/images_normalized/9430/73be873540f797cf0855ccb2c03b23575f46762df5e9b0cfb7b0f832fd2bd1a8.jpg\",\"image_path_cdn\":\"wasabi_dragon/images/9430/73be873540f797cf0855ccb2c03b23575f46762df5e9b0cfb7b0f832fd2bd1a8.jpg\"}", :exception=>#<LogStash::Json::ParserError: Unexpected character ('/' (code 47)): maybe a (non-standard) comment? (not recognized as one since Feature 'ALLOW_COMMENTS' not enabled for parser)

It seems that the json is not valid because it has been truncated somewhere from someone.
Investigating, I found the original file that contains the line that causes the error. The file contains these two rows:

{"url":"https://www.demon-tweeks.com/uk/mishimoto-weighted-shift-knob-1147808/","image_path_locale":"wasabi_dragon/images_normalized/330763/f05af6261cf450c191e8740c677007e41d290d162eec3bf25ca0ce607b149603.jpg","image_path_cdn":"wasabi_dragon/images/330763/f05af6261cf450c191e8740c677007e41d290d162eec3bf25ca0ce607b149603.jpg"}
{"url":"https://www.super-bike.ch/it/caschi-modulari/21111-165734-ls2-ff313-vortex-solid-matt-carbon.html#/2-taglia-s?SubmitCurrency=1&id_currency=1","image_path_locale":"wasabi_dragon/images_normalized/9430/73be873540f797cf0855ccb2c03b23575f46762df5e9b0cfb7b0f832fd2bd1a8.jpg","image_path_cdn":"wasabi_dragon/images/9430/73be873540f797cf0855ccb2c03b23575f46762df5e9b0cfb7b0f832fd2bd1a8.jpg"}

Those rows are well formatted and the strange part is that if now I rename the file and put it in the Filebeat directory everything will work perfectly and data will reach Elasticsearch.

I have enabled debug mode for Filebeat and I found that the published message is wrong:

Dec  5 09:05:15 ES-NVMe1 filebeat[1327]: 2019-12-05T09:05:15.335+0100#011DEBUG#011[processors]#011processing/processors.go:183#011Publish event: {
Dec  5 09:05:15 ES-NVMe1 filebeat[1327]:   "@timestamp": "2019-12-05T08:05:15.335Z",
Dec  5 09:05:15 ES-NVMe1 filebeat[1327]:   "@metadata": {
Dec  5 09:05:15 ES-NVMe1 filebeat[1327]:     "beat": "filebeat",
Dec  5 09:05:15 ES-NVMe1 filebeat[1327]:     "type": "_doc",
Dec  5 09:05:15 ES-NVMe1 filebeat[1327]:     "version": "7.4.2"
Dec  5 09:05:15 ES-NVMe1 filebeat[1327]:   },
Dec  5 09:05:15 ES-NVMe1 filebeat[1327]:   "message": "/www.super-bike.ch/it/caschi-modulari/21111-165734-ls2-ff313-vortex-solid-matt-carbon.html#/2-taglia-s?SubmitCurrency=1&id_currency=1\",\"image_path_locale\":\"wasabi_dragon/images_normalized/9430/73be873540f797cf0855ccb2c03b23575f46762df5e9b0cfb7b0f832fd2bd1a8.jpg\",\"image_path_cdn\":\"wasabi_dragon/images/9430/73be873540f797cf0855ccb2c03b23575f46762df5e9b0cfb7b0f832fd2bd1a8.jpg\"}",
Dec  5 09:05:15 ES-NVMe1 filebeat[1327]:   "log": {
Dec  5 09:05:15 ES-NVMe1 filebeat[1327]:     "offset": 367,
Dec  5 09:05:15 ES-NVMe1 filebeat[1327]:     "file": {
Dec  5 09:05:15 ES-NVMe1 filebeat[1327]:       "path": "/path/to/json/2019_12_5_9_5_25624.json"
Dec  5 09:05:15 ES-NVMe1 filebeat[1327]:     }
Dec  5 09:05:15 ES-NVMe1 filebeat[1327]:   },
Dec  5 09:05:15 ES-NVMe1 filebeat[1327]:   "fields": {
Dec  5 09:05:15 ES-NVMe1 filebeat[1327]:     "log_type": "stalker"
Dec  5 09:05:15 ES-NVMe1 filebeat[1327]:   },
Dec  5 09:05:15 ES-NVMe1 filebeat[1327]:   "input": {
Dec  5 09:05:15 ES-NVMe1 filebeat[1327]:     "type": "log"
Dec  5 09:05:15 ES-NVMe1 filebeat[1327]:   },
Dec  5 09:05:15 ES-NVMe1 filebeat[1327]:   "ecs": {
Dec  5 09:05:15 ES-NVMe1 filebeat[1327]:     "version": "1.1.0"
Dec  5 09:05:15 ES-NVMe1 filebeat[1327]:   },
Dec  5 09:05:15 ES-NVMe1 filebeat[1327]:   "host": {
Dec  5 09:05:15 ES-NVMe1 filebeat[1327]:     "name": "ES-NVMe1"
Dec  5 09:05:15 ES-NVMe1 filebeat[1327]:   },
Dec  5 09:05:15 ES-NVMe1 filebeat[1327]:   "agent": {
Dec  5 09:05:15 ES-NVMe1 filebeat[1327]:     "id": "651bec87-4a0e-4edb-b5e3-795d8fd54031",
Dec  5 09:05:15 ES-NVMe1 filebeat[1327]:     "version": "7.4.2",
Dec  5 09:05:15 ES-NVMe1 filebeat[1327]:     "type": "filebeat",
Dec  5 09:05:15 ES-NVMe1 filebeat[1327]:     "ephemeral_id": "92a2cfde-bb87-4cce-8ae6-8642e183b427",
Dec  5 09:05:15 ES-NVMe1 filebeat[1327]:     "hostname": "ES-NVMe1"
Dec  5 09:05:15 ES-NVMe1 filebeat[1327]:   }
Dec  5 09:05:15 ES-NVMe1 filebeat[1327]: }

So there is something wrong with Filebeat but it is not deterministic.
Here the filebeat.yml:

filebeat.inputs:
- type: log
  close_eof: true
  paths:
    - /path/to/json/*
  fields: {log_type: stalker}

output.logstash:
  hosts: ["127.0.0.1:5043"]

logging.level: debug
logging.to_files: true
logging.to_syslog: false
logging.files:
  path: /var/log/filebeat
  name: mybeat.log
  keepfiles: 7

I cannot reproduce the error, it happens every how and I don't know why.
Do you have any idea that helps me?
Any other test to do?

Thanks
Marco

I think the cullprit could be the config option close_eof: true. This option aborts the reading of the file when filebeat reaches an EOF (which happens continously when the file is written to live).
Could you remove that one and try again?

thank you @Shaoranlaos for the answer.

I forgot to mention that each file is written and closed then moved to the filebeat's directory.
Those files won't be opened again.

I have sat the close_eof: true yesterday in order to try this new option hoping that will solve my problem but it did not solve it.
So that option does not cause the problem.

Before close_eof I used to have close_inactive: 2m option.

How are your files named?
I had a similar Problem when the coping of the files reused the inode number of an older logfile which was not yet deleted from the filebeat registry and had the same name.
I am not total sure which of the clean_* options had helped for me there (it was a year ago and the server has since than changed). I think it was clean_inactive: logrotation time e.g. 12h.

Info: Because the inode number is found in the registry filebeat infers it was only renamed and continues on the saved offset which can lead to such head truncated messages

P.S.: you can use ls -i to show the inode number of files to compare them before and after the copying
P.S.2: Filebeat also compares the current saved offset with the length of the read file and if the file is shorter then the saved offset it will read the file from the beginning
P.S.3: All of this would implie that all lines that come before this truncated line should be missing in your elasticsearch data.

The filename is formed like this:

20191205081900492920-3019-IT-buffer.json => year/month/day/hours/minutes/seconds/milliseconds/randomNumber-country-buffer.json

where randomNumber is something like this: Math.floor(Math.random() * Math.floor(1000))

any idea?

Are you deleting logs in the folder? If yes could you compare the inode numbers of the deleted files with the inode number of the newly copied files? If they are the same a configuration of

ignore_older: 8h
clean_inactive: 12h

could solve it.
You should change the timespans to something that is within the time of your deletion timespan, e.g.:
deletion timespan: 18h
ignore_older: 8h
clean_inactive: 12h

see https://www.elastic.co/guide/en/beats/filebeat/current/inode-reuse-issue.html

1 Like

thank you very much for all those information!

Actually, every day we remove the yesterday's files.
I will give a try to your configuration and I have high expectations on that

Thank you again!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.