CSV filter duplicate output result

pierrejutard · January 25, 2020, 1:35pm

Hello,

Under windows 10, with 7.5.2 ES stack version, I try to make a pipeline from a csv file (";") delimiter to ES but the output is not stable.
For instance it duplicates the result:

The output is:{
"twod" => 0.0,
"oned" => 0.0,
"@timestamp" => 2020-01-25T13:21:09.117Z,
"spot" => 0.0
}
{
"twod" => 6.0,
"oned" => 3.0,
"@timestamp" => 2020-01-25T13:21:09.138Z,
"spot" => 2.0
}

With the following config file:

input {
file {
path => "C:/Users/jutar/OneDrive/Desktop/mktdata.csv"
start_position => "beginning"
sincedb_path => "NUL"
}
}
filter {
csv {
separator => ";"
columns => ["spot","oned","twod"]
remove_field => ["host","path","@version","message"]
}
mutate{convert =>["spot","float"]}
mutate{convert =>["oned","float"]}
mutate{convert =>["twod","float"]}
}
output {
elasticsearch {
hosts => "http://localhost:9200"
index => "mktdata-%{+YYYY.MM.dd}"
}
stdout {}
}

And when using these lines in the config file csv filter:
csv {
autodetect_column_names => true
}

, it sometimes inverts the oupt column, i.e.:

      "6" => twod,
      "3" => oned,
"@timestamp" => 2020-01-25T13:21:09.138Z,
      "2" => spot

My csv file is the following:
"spot" in A1 cell "oned" in B1 and "twod" in C1
"2" in A2 cell "3" in in B2 cell and "6" in C2 cell.

Can anyone know the answer of this strange parsing?

Thank you in advance
Pierre Jutard

pierrejutard · January 25, 2020, 1:52pm

This is how Logstash see the message i have to parse:
{
"column1" => "?spot;oned;twod",
"message" => "?spot;oned;twod\r"
}
{
"column1" => "2;3;6",
"message" => "2;3;6\r"
}

pierrejutard · January 25, 2020, 2:52pm

Hello guys,

After investigation, when using this filter:
filter {
csv {
autodetect_column_names => true
separator => ";"
remove_field => ["host","path","@version","message"]
}

When saving with this extension CSV (DOS): it inverts the column with their respective values:
{
"5" => "oned",
"7" => "twod",
"@timestamp" => 2020-01-25T14:47:01.840Z,
"3" => "spot"
}

but when saving with this extension CSV ( semi-colon) it is working:
{
"twod" => 7.0,
"oned" => 5.0,
"@timestamp" => 2020-01-25T14:46:02.416Z,
"spot" => 3.0
}

Pierre Jutard

Badger · January 25, 2020, 3:51pm

When using autodetect_column_names you must set pipeline.workers to 1, and also disable java_execution.

pierrejutard · January 25, 2020, 7:33pm

Ok i already used the default pipeline.workers setting.(=1 apparently) and the java execution should be set to false when launching the pipelines in Logstash like this: bin/logstash --java-execution=false but is there another simple way to do it?

Pierre Jutard

ropc · January 26, 2020, 3:13am

@pierrejutard - You can use one of the following option:

command-line flag --java-execution and set the value to false.
pipeline.java_execution parameter and set the value to false in the logstash.yml file.

pierrejutard · January 27, 2020, 5:59pm

Thank you Romain it worked and it's a simple way.

Best,
Pierre Jutard

system · February 24, 2020, 6:00pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Problem with csv filter Logstash	7	401	June 24, 2020
CSV Output Plugin - Duplicate Entries Logstash	5	727	June 16, 2020
Elasticsearch result: Array to CSV Logstash	3	598	April 19, 2020
Logstash ingest and export to elasticsearch files twice Logstash	16	684	March 16, 2022
CSV filter not working properly Logstash	3	552	March 25, 2022

CSV filter duplicate output result

Related topics