Filebeat not picking up CSV properly

yungnvn · August 12, 2019, 11:29pm

I have a filebeat config to pick up a CSV file shown below:

paths:
    - /path/to/CSV
  multiline.pattern: '^\d'
  multiline.negate: true
  multiline.match: after

Here is a sample from the CSV:

RTime,Concept,Time,YestDate,YestCount,PrevDate,PrevCount,TodDate,TodCount
2019-08-12 14:10:39.993000000,WS,20:30,1900-01-01,0,2018-08-13,3,1900-01-01,0
2019-08-12 14:10:39.993000000,WS,21:00,1900-01-01,0,2018-08-13,2,1900-01-01,0
2019-08-12 14:10:39.993000000,WS,21:30,2019-08-11,1,2018-08-13,1,1900-01-01,0
2019-08-12 14:10:39.993000000,WS,Total,1900-01-01,717,1900-01-01,642,1900-01-01,375

The problem is that filebeat picks up the last couple of characters in the message field.
For ex, the above CSV sample picked up 01,375 instead of the full row.

I believe it has something to do with the delimeter but haven't been able to figure out exactly what's needed...

Could it because of the way the CSV is written to? I noticed that when I renamed the CSV and told filebeat to pick it up, it would pick up all the rows find but when something new got written, it would only pick up the last bit like shown above... maybe the csv has to write a new line?

Any help would be appreciated!

Thanks

steffens · August 13, 2019, 9:53am

Could it because of the way the CSV is written to?

Potentially, yes. How exactly is the CSV written?

maybe the csv has to write a new line?

yes.

Filebeat tails a file, it does not send the complete log file. When tailing it first splits the log into multiple lines (based on \n or \r\n by default), and then applies the multiline filter.

After sending it remembers the last file offset and starts from this offset when the file size has increased.

The way you describe it, it sounds like there was already a newline written, but then "old file contents" gets overwritten. This would indeed break collection, as filebeat does not track overall file changes, but only the file size and the last read offset.

yungnvn · August 16, 2019, 10:14pm

Thanks for the reply Steffen! So I had the developer who was generating the CSV add a newline character after every row so now its looking for^\n which i believe should work. However, we have ran into another unrelated issue where filebeat doesn't recognize the output section of the YML so it doesn't output the way I want it to... When I test config, it says Config OK but the output part of the YML is being completely ignored for some reason.

steffens · August 19, 2019, 12:00am

Can you share your config?

yungnvn · August 19, 2019, 5:09pm

- type: log
  enabled: true
  paths:
    - D:\path\to\CSV.csv
  multiline.pattern: '^\n'
  multiline.negate: true
  multiline.match: after
  fields:
    LogName: CSOOrderCount
- type: log
  enabled: true
  paths:
    - D:\path\to\CSV.csv
  multiline.pattern: '^\n'
  multiline.negate: true
  multiline.match: after
  fields:
    LogName: QCAppCount
filebeat.config.modules:
  path: ${path.config}/modules.d/*.yml
  reload.enabled: false  
fields:
  app_id: stores-corplogs
output.kafka:
  hosts: ["host1:5044", "host2:5044", "host3:5044"]
  topic: '%{[fields.app_id]}'
  partition.round_robin:
    reachable_only: false
  required_acks: 1
  compression: gzip
  max_message_bytes: 1000000

We are setting the topic dynamically using one of the fields from the config. However the problem is that the output isn't being recognized at all. Below is a sample from the filebeat logs:
It then goes on to collect the metrics like it normally does, but output section is never recognized

2019-08-15T14:28:02.065-0700    INFO    instance/beat.go:292    Setup Beat: filebeat; Version: 7.3.0
2019-08-15T14:28:02.066-0700    INFO    [publisher] pipeline/module.go:97   Beat name: rkscomprdsql1
2019-08-15T14:28:02.076-0700    WARN    beater/filebeat.go:152  Filebeat is unable to load the Ingest Node pipelines for the configured modules because the Elasticsearch output is not configured/enabled. If you have already loaded the Ingest Node pipelines or are using Logstash pipelines, you can ignore this warning.
2019-08-15T14:28:02.076-0700    INFO    instance/beat.go:421    filebeat start running.
2019-08-15T14:28:02.076-0700    INFO    [monitoring]    log/log.go:118  Starting metrics logging every 30s
2019-08-15T14:28:02.077-0700    INFO    registrar/registrar.go:145  Loading registrar data from C:\ProgramData\filebeat\registry\filebeat\data.json
2019-08-15T14:28:02.077-0700    INFO    registrar/registrar.go:152  States Loaded from registrar: 4
2019-08-15T14:28:02.078-0700    WARN    beater/filebeat.go:368  Filebeat is unable to load the Ingest Node pipelines for the configured modules because the Elasticsearch output is not configured/enabled. If you have already loaded the Ingest Node pipelines or are using Logstash pipelines, you can ignore this warning.
2019-08-15T14:28:02.078-0700    INFO    crawler/crawler.go:72   Loading Inputs: 2
2019-08-15T14:28:02.078-0700    INFO    log/input.go:148    Configured paths: [D:\path\to\CSV.csv]
2019-08-15T14:28:02.079-0700    INFO    input/input.go:114  Starting input of type: log; ID: 17732004819188974107 
2019-08-15T14:28:02.079-0700    INFO    log/input.go:148    Configured paths: [D:\path\to\CSV.csv]
2019-08-15T14:28:02.079-0700    INFO    input/input.go:114  Starting input of type: log; ID: 11699556114771754964 
2019-08-15T14:28:02.080-0700    INFO    crawler/crawler.go:106  Loading and starting Inputs completed. Enabled inputs: 2
2019-08-15T14:28:02.080-0700    INFO    log/harvester.go:253    Harvester started for file: D:\path\to\CSV.csv
2019-08-15T14:28:02.080-0700    INFO    cfgfile/reload.go:171   Config reloader started
2019-08-15T14:28:02.081-0700    INFO    cfgfile/reload.go:226   Loading of config files completed.
2019-08-15T14:28:02.081-0700    INFO    log/harvester.go:253    Harvester started for file: D:\path\to\CSV.csv```

kumarabhi · August 19, 2019, 9:18pm

I do not see a problem. The harvesters are open for files.

2019-08-15T14:28:02.076-0700 WARN beater/filebeat.go:152 Filebeat is unable to load the Ingest Node pipelines for the configured modules because the Elasticsearch output is not configured/enabled. If you have already loaded the Ingest Node pipelines or are using Logstash pipelines, you can ignore this warning.

The above line is just a warning. You are connecting to KAFKA and should ignore this warning.
Also, please check if the KAFKA topic is receiving the events

steffens · August 20, 2019, 10:49am

Wait for 'metrics' to appear in the logs. Then we can tell if any events are produced.
You problem might either be: filebeat was already at the end of the files (filebeat will eventually log 'inactive'), or your multiline pattern is bogus. Btw. if you want to match for an empty line better use ^\s*$. When reading lines, filebeat removes the newline separator (there can be different ones). Your multiline pattern suggests that filebeat happily accumulates the whole files in memory (it stops accumulating contents at max_bytes, but still continues processing contents).

system · September 17, 2019, 10:49am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.