Logstash: Ingesting CSV, filtered by CSV columns

Is it possible to ingest data via logstash by applying a filter on data in the CSV file being ingested?

For example:

input {
  file {
path => "blah/blah/*.gz"
max_open_files => 16000
mode => "read"
start_position => "beginning"
sincedb_path => "/dev/null"
exit_after_read  => true
   }
}

*# Ingest only rows where [field1]=1000*
filter{
csv {
  separator => "|"
  columns => [
  "time",
  "field 1",
  "field 2"
  ]
}
}
output {
blah...blah...
 }

You could add

if [field1] != "1000" { drop {} }

after the csv filter.

So a follow up question, because I neglected to explain my first question.

I have multiple CSV files (field1 is same within a file), assume each has three fields and each file is uniquely identifiable by filed1, and field1 value will repeat for entirety of that particular file, for example:

CSV1 with field1 = 1000:
field1|field2|field3

CSV2 with field1 = 2000:
field1|field2|field3

CSV3 with field1 = 3000:
field1|field2|field3

by adding if [field1] != "1000" { drop {} }, it will will read through all entries for all CSVs? correct? Reason i ask is because each CSV will have millions of rows, and reading a file that is not relevant can have performance impact. in that case i would need to think of another solution.

Thank you!

That will drop any event where [field1] is not "1000", so it will drop every event from the second and third CSVs. But if you want to do that then why read them in the first place?

sure, because all files arrive in a shared directory and we want correct pipeline to ingest the relevant file. for example, we don't want pipeline for field1=1000 ingesting data from CSV with field1=2000. So, at least I am not aware of any way to check before ingesting, the value of field1.

and the way source system is setup we cannot have separate directories.

Your suggestion worked in my test and i did not see any performance issue.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.