Autodetect_column_names & different header names mixup


(Sjaak) #1

Hi,

I have a piece of hardware that is outputting 4 .csv log files. The headers are mostly the same but not always and in same cases the same header exists in all 4 files but in a different column (e.g. CN0 exists in all files but is in column 3, 6 or 7).

But autodetect_column_names is mixing up the data. The first file is loaded and created fine. After the first file is done and the second file is loaded things get strange. In some cases it is scanning header names that weren't in the first file and creates fields for them but the data in them is wrong. Sometimes its also creating column10, 11, 12 (even though all files have header fields and work when testing individually) and putting incorrect data in them.

I think its because the plugin is remembering the position of some headers that might be in a different column depending on the file.

Is there any way to reset the autodetect_column_names when it loads a new file?

input {
  file {
    type => "Test"
    path => "/home/test/Desktop/test/*.csv"
    sincedb_path => "/dev/null"
    start_position => "beginning"
  }
}


filter {
    csv {
    autodetect_column_names => true
    }
    date {
    match => [ "Date", "yyyy-MM-dd HH:mm:ss" ]
    timezone => "UTC"
    target => "@timestamp"
    }
    mutate {
    remove_field => ["message", "Date"]
}
}

output {
  stdout {
    codec => rubydebug
  }
}

(system) #2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.