CSV filter duplicate output result

Hello,

Under windows 10, with 7.5.2 ES stack version, I try to make a pipeline from a csv file (";") delimiter to ES but the output is not stable.
For instance it duplicates the result:

The output is:{
"twod" => 0.0,
"oned" => 0.0,
"@timestamp" => 2020-01-25T13:21:09.117Z,
"spot" => 0.0
}
{
"twod" => 6.0,
"oned" => 3.0,
"@timestamp" => 2020-01-25T13:21:09.138Z,
"spot" => 2.0
}

With the following config file:

input {
file {
path => "C:/Users/jutar/OneDrive/Desktop/mktdata.csv"
start_position => "beginning"
sincedb_path => "NUL"
}
}
filter {
csv {
separator => ";"
columns => ["spot","oned","twod"]
remove_field => ["host","path","@version","message"]
}
mutate{convert =>["spot","float"]}
mutate{convert =>["oned","float"]}
mutate{convert =>["twod","float"]}
}
output {
elasticsearch {
hosts => "http://localhost:9200"
index => "mktdata-%{+YYYY.MM.dd}"
}
stdout {}
}

And when using these lines in the config file csv filter:
csv {
autodetect_column_names => true
}

, it sometimes inverts the oupt column, i.e.:

      "6" => twod,
      "3" => oned,
"@timestamp" => 2020-01-25T13:21:09.138Z,
      "2" => spot

My csv file is the following:
"spot" in A1 cell "oned" in B1 and "twod" in C1
"2" in A2 cell "3" in in B2 cell and "6" in C2 cell.

Can anyone know the answer of this strange parsing?

Thank you in advance
Pierre Jutard

This is how Logstash see the message i have to parse:
{
"column1" => "?spot;oned;twod",
"message" => "?spot;oned;twod\r"
}
{
"column1" => "2;3;6",
"message" => "2;3;6\r"
}

Hello guys,

After investigation, when using this filter:
filter {
csv {
autodetect_column_names => true
separator => ";"
remove_field => ["host","path","@version","message"]
}

When saving with this extension CSV (DOS): it inverts the column with their respective values:
{
"5" => "oned",
"7" => "twod",
"@timestamp" => 2020-01-25T14:47:01.840Z,
"3" => "spot"
}

but when saving with this extension CSV ( semi-colon) it is working:
{
"twod" => 7.0,
"oned" => 5.0,
"@timestamp" => 2020-01-25T14:46:02.416Z,
"spot" => 3.0
}

Pierre Jutard

When using autodetect_column_names you must set pipeline.workers to 1, and also disable java_execution.

Ok i already used the default pipeline.workers setting.(=1 apparently) and the java execution should be set to false when launching the pipelines in Logstash like this: bin/logstash --java-execution=false but is there another simple way to do it?

Pierre Jutard

@pierrejutard - You can use one of the following option:

  • command-line flag --java-execution and set the value to false.
  • pipeline.java_execution parameter and set the value to false in the logstash.yml file.

Thank you Romain it worked and it's a simple way.

Best,
Pierre Jutard

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.