# Filebeat not picking up CSV properly

I have a filebeat config to pick up a CSV file shown below:

paths:
- /path/to/CSV
multiline.pattern: '^\d'
multiline.negate: true
multiline.match: after

Here is a sample from the CSV:

RTime,Concept,Time,YestDate,YestCount,PrevDate,PrevCount,TodDate,TodCount
2019-08-12 14:10:39.993000000,WS,20:30,1900-01-01,0,2018-08-13,3,1900-01-01,0
2019-08-12 14:10:39.993000000,WS,21:00,1900-01-01,0,2018-08-13,2,1900-01-01,0
2019-08-12 14:10:39.993000000,WS,21:30,2019-08-11,1,2018-08-13,1,1900-01-01,0
2019-08-12 14:10:39.993000000,WS,Total,1900-01-01,717,1900-01-01,642,1900-01-01,375

The problem is that filebeat picks up the last couple of characters in the message field.
For ex, the above CSV sample picked up 01,375 instead of the full row.

I believe it has something to do with the delimeter but haven't been able to figure out exactly what's needed...

Could it because of the way the CSV is written to? I noticed that when I renamed the CSV and told filebeat to pick it up, it would pick up all the rows find but when something new got written, it would only pick up the last bit like shown above... maybe the csv has to write a new line?

Any help would be appreciated!

Thanks

Could it because of the way the CSV is written to?

Potentially, yes. How exactly is the CSV written?

maybe the csv has to write a new line?

yes.

Filebeat tails a file, it does not send the complete log file. When tailing it first splits the log into multiple lines (based on \n or \r\n by default), and then applies the multiline filter.

After sending it remembers the last file offset and starts from this offset when the file size has increased.

The way you describe it, it sounds like there was already a newline written, but then "old file contents" gets overwritten. This would indeed break collection, as filebeat does not track overall file changes, but only the file size and the last read offset.

Thanks for the reply Steffen! So I had the developer who was generating the CSV add a newline character after every row so now its looking for^\n which i believe should work. However, we have ran into another unrelated issue where filebeat doesn't recognize the output section of the YML so it doesn't output the way I want it to... When I test config, it says Config OK but the output part of the YML is being completely ignored for some reason.

Can you share your config?

- type: log
enabled: true
paths:
- D:\path\to\CSV.csv
multiline.pattern: '^\n'
multiline.negate: true
multiline.match: after
fields:
LogName: CSOOrderCount
- type: log
enabled: true
paths:
- D:\path\to\CSV.csv
multiline.pattern: '^\n'
multiline.negate: true
multiline.match: after
fields:
LogName: QCAppCount
filebeat.config.modules:
path: ${path.config}/modules.d/*.yml reload.enabled: false fields: app_id: stores-corplogs output.kafka: hosts: ["host1:5044", "host2:5044", "host3:5044"] topic: '%{[fields.app_id]}' partition.round_robin: reachable_only: false required_acks: 1 compression: gzip max_message_bytes: 1000000 We are setting the topic dynamically using one of the fields from the config. However the problem is that the output isn't being recognized at all. Below is a sample from the filebeat logs: It then goes on to collect the metrics like it normally does, but output section is never recognized 2019-08-15T14:28:02.065-0700 INFO instance/beat.go:292 Setup Beat: filebeat; Version: 7.3.0 2019-08-15T14:28:02.066-0700 INFO [publisher] pipeline/module.go:97 Beat name: rkscomprdsql1 2019-08-15T14:28:02.076-0700 WARN beater/filebeat.go:152 Filebeat is unable to load the Ingest Node pipelines for the configured modules because the Elasticsearch output is not configured/enabled. If you have already loaded the Ingest Node pipelines or are using Logstash pipelines, you can ignore this warning. 2019-08-15T14:28:02.076-0700 INFO instance/beat.go:421 filebeat start running. 2019-08-15T14:28:02.076-0700 INFO [monitoring] log/log.go:118 Starting metrics logging every 30s 2019-08-15T14:28:02.077-0700 INFO registrar/registrar.go:145 Loading registrar data from C:\ProgramData\filebeat\registry\filebeat\data.json 2019-08-15T14:28:02.077-0700 INFO registrar/registrar.go:152 States Loaded from registrar: 4 2019-08-15T14:28:02.078-0700 WARN beater/filebeat.go:368 Filebeat is unable to load the Ingest Node pipelines for the configured modules because the Elasticsearch output is not configured/enabled. If you have already loaded the Ingest Node pipelines or are using Logstash pipelines, you can ignore this warning. 2019-08-15T14:28:02.078-0700 INFO crawler/crawler.go:72 Loading Inputs: 2 2019-08-15T14:28:02.078-0700 INFO log/input.go:148 Configured paths: [D:\path\to\CSV.csv] 2019-08-15T14:28:02.079-0700 INFO input/input.go:114 Starting input of type: log; ID: 17732004819188974107 2019-08-15T14:28:02.079-0700 INFO log/input.go:148 Configured paths: [D:\path\to\CSV.csv] 2019-08-15T14:28:02.079-0700 INFO input/input.go:114 Starting input of type: log; ID: 11699556114771754964 2019-08-15T14:28:02.080-0700 INFO crawler/crawler.go:106 Loading and starting Inputs completed. Enabled inputs: 2 2019-08-15T14:28:02.080-0700 INFO log/harvester.go:253 Harvester started for file: D:\path\to\CSV.csv 2019-08-15T14:28:02.080-0700 INFO cfgfile/reload.go:171 Config reloader started 2019-08-15T14:28:02.081-0700 INFO cfgfile/reload.go:226 Loading of config files completed. 2019-08-15T14:28:02.081-0700 INFO log/harvester.go:253 Harvester started for file: D:\path\to\CSV.csv` I do not see a problem. The harvesters are open for files. 2019-08-15T14:28:02.076-0700 WARN beater/filebeat.go:152 Filebeat is unable to load the Ingest Node pipelines for the configured modules because the Elasticsearch output is not configured/enabled. If you have already loaded the Ingest Node pipelines or are using Logstash pipelines, you can ignore this warning. The above line is just a warning. You are connecting to KAFKA and should ignore this warning. Also, please check if the KAFKA topic is receiving the events Wait for 'metrics' to appear in the logs. Then we can tell if any events are produced. You problem might either be: filebeat was already at the end of the files (filebeat will eventually log 'inactive'), or your multiline pattern is bogus. Btw. if you want to match for an empty line better use ^\s*$. When reading lines, filebeat removes the newline separator (there can be different ones). Your multiline pattern suggests that filebeat happily accumulates the whole files in memory (it stops accumulating contents at max_bytes, but still continues processing contents).

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.