Skip header line in CSV input (v 1.5.0)

Hello,

I am importing some csv files and would like to skip the first line (headers) - at the moment if the csv has headers the lines will fail to import - if i remove the headers they import fine. I did some searching and found that i am supposed to use an if statement with DROP {}:

   input {  
      file {
          path => "C:/ElasticSuite/Logs/*.csv"
          type => "mytype"
		  discover_interval => "15"
		  		 		              }
}
filter {  
	csv {
	columns => ["col1", "col2", "col3"]
         separator => ","
    }
	if [col1] == "headername" 
	{
	drop {}
	}	
mutate {
		gsub => ["ipAddress","NA","127.0.0.1"]
		gsub => ["dbLogFileSizeMb","NA","0"]
	  }
	}

I tried putting the if below the csv then below the mutate but it still failed.

Any ideas?

Thanks,

2 Likes

Does the drop not work at all?

This general issue has been raised here, but I don't have any other advice sorry.

Thanks, I got it working by looking at a sample from within the link that you posted.. I had to had to wrap the column and string within parentheses.

if ([col1] == "headername") {
    drop { }
  } 

Not sure why most of the samples on the net show it without.

Thanks again

2 Likes

But the bigger question here is still, "How can you have the first row parsed as the column headings, and then dropped during indexing?"

or, "How can you have the first row parsed as the column headings, and then offset the first line from being indexed?"

This discussion should be used to promote https://github.com/elastic/logstash/issues/2088, where a solution to this issue is proposed.

I too need to process CSV files with first line schemas. Unfortunately the schemas can vary from file to file too.

Not seeing anything out there I got to work and coded up a subclass of logstash-input-file that adds CSV parsing with a first-line-schema mode. The basic CSV processing is largely borrowed from logstash-filter-csv.

Code is at https://github.com/jweite/logstash-input-csvfile. It's strictly alpha at this stage, but I'd be interested in having it considered for submission.

PS, while I initially considered enhancing logstash-filter-csv I ultimately concluded that the only 100% reliable way to restart stream processing mid-file was to re-read the file's schema row, something that only the file input plugin can always do.

It'd be worth you making a new thread so it doesn't get lost :slight_smile:

Will do, thanks for the tip Mark.