Autodetect_column_names is not working as expected in csv filter pluing

RohanKumbhar · April 16, 2019, 10:25pm

HI Team,

I'm trying to parse csv files with few rows and trying to auto detect the column names but its not working as expected
for example input temp.csv

name,compan,emp,abc,address
ro han,,,be,myadd
a b c,234,3 city
,ABC CO. LTD.,mycomp,myemp,myabcc, city WEST

test.conf

input {
file {
path => "/tmp/temp.csv"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {

csv {
skip_header => true
autodetect_column_names => true
autogenerate_column_names => true
}
}
output {
stdout {codec=>rubydebug}
}

output

{
"message" => "name,compan,emp,abc,address",
"path" => "/tmp/temp.csv",
"column2" => "compan",
"myadd" => "address",
"column3" => "emp",
"ro han" => "name",
"@timestamp" => 2019-04-16T17:10:14.888Z,
"be" => "abc",
"@version" => "1"
}
{
"ro han" => "a b c",
"@timestamp" => 2019-04-16T17:10:14.925Z,
"message" => "a b c,234,3 city",
"@version" => "1",
"path" => "/tmp/temp.csv",
"column2" => "234",
"column3" => "3 city"
}
{
"message" => ",ABC CO. LTD.,mycomp,myemp,myabcc, city WEST",
"path" => "/tmp/temp.csv",
"column2" => "ABC CO. LTD.",
"myadd" => "myabcc",
"column3" => "mycomp",
"column6" => " city WEST",
"ro han" => nil,
"@timestamp" => 2019-04-16T17:10:14.926Z,
"be" => "myemp",
"@version" => "1"
}

Kindly help here to auto detect column names

Thanks,
Rohan

Badger · April 16, 2019, 10:57pm

Have you set "--pipeline.workers 1"? You cannot use multiple worker threads with autodetect_column_names because it creates race conditions. Specifically, a second worker thread could parse the second row and use it to set the column names before the first worker thread does so, which appears to be exactly what happened here.

Issues 65 and 72 on github are related.

RohanKumbhar · April 17, 2019, 10:44am

Thanks @Badger !

Nope i haven't set worker to 1 it was set to default

system · May 15, 2019, 10:44am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Autodetect_column_name with 2 different CSV Logstash	5	471	September 29, 2022
CSV: autodetect_column_names vs. autogenerate_column_names Logstash	3	4434	February 12, 2019
Autodetect_column_names is not working as expected in csv filter plugin Logstash	3	303	June 12, 2023
Multiple csv with different columns Logstash	4	733	May 28, 2021
Autodetect_column_names & different header names mixup Logstash	1	1137	October 5, 2017

Autodetect_column_names is not working as expected in csv filter pluing

Related topics