Logstash: Ingesting CSV, filtered by CSV columns

hsalim · May 12, 2021, 4:37pm

Is it possible to ingest data via logstash by applying a filter on data in the CSV file being ingested?

For example:

input {
  file {
path => "blah/blah/*.gz"
max_open_files => 16000
mode => "read"
start_position => "beginning"
sincedb_path => "/dev/null"
exit_after_read  => true
   }
}

*# Ingest only rows where [field1]=1000*
filter{
csv {
  separator => "|"
  columns => [
  "time",
  "field 1",
  "field 2"
  ]
}
}
output {
blah...blah...
 }

Badger · May 12, 2021, 5:19pm

You could add

if [field1] != "1000" { drop {} }

after the csv filter.

hsalim · May 12, 2021, 5:55pm

So a follow up question, because I neglected to explain my first question.

I have multiple CSV files (field1 is same within a file), assume each has three fields and each file is uniquely identifiable by filed1, and field1 value will repeat for entirety of that particular file, for example:

CSV1 with field1 = 1000:
field1|field2|field3

CSV2 with field1 = 2000:
field1|field2|field3

CSV3 with field1 = 3000:
field1|field2|field3

by adding if [field1] != "1000" { drop {} }, it will will read through all entries for all CSVs? correct? Reason i ask is because each CSV will have millions of rows, and reading a file that is not relevant can have performance impact. in that case i would need to think of another solution.

Thank you!

Badger · May 12, 2021, 7:08pm

That will drop any event where [field1] is not "1000", so it will drop every event from the second and third CSVs. But if you want to do that then why read them in the first place?

hsalim · May 12, 2021, 7:30pm

sure, because all files arrive in a shared directory and we want correct pipeline to ingest the relevant file. for example, we don't want pipeline for field1=1000 ingesting data from CSV with field1=2000. So, at least I am not aware of any way to check before ingesting, the value of field1.

and the way source system is setup we cannot have separate directories.

Your suggestion worked in my test and i did not see any performance issue.

system · June 9, 2021, 7:30pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Easier ingestion of CSV files Logstash	3	388	January 4, 2018
Ingest CSV file with Logstash fails Logstash	2	360	June 16, 2021
Logstash can't recognize csv columns Logstash	6	725	September 20, 2019
Handle Blank data in CSV - Ingest By Using Logstash Logstash	2	1206	June 12, 2018
Processor for csv filter Elasticsearch	8	682	November 22, 2017

Logstash: Ingesting CSV, filtered by CSV columns

Related topics