Converting more than one column destroys data

Christoph1 · September 28, 2020, 9:54am

converting more than one column from string > integer completely invalidates the logstash output.

The values are interpreted as keys somehow

What am I missing when converting?

regardless of:
CSV { convert => {}}
mutate{ convert => {}}

Badger · September 28, 2020, 12:27pm

What do you mean by this?

Christoph1 · September 28, 2020, 1:13pm

Thank you for your reply.

I am quite new to the logstash pipeline, which means I am struggling to identify causes and problems:

my scenario is as follows:

i have +.csv s that are read via filebeat, and parsed via filter csv plugin.

  csv {
    autodetect_column_names => true
    skip_header => true
    separator =>";"
    convert => {
 "key1"=>"integer"
 "key2"=>"integer"
 ...
 }

When converting any or a number a colums, the output gets strange:

the original strings

 "key1":"1"
 "key2":"2"

that should look like

 "key1":1
 "key2":2

may become
"1":2
or
"key1":1
or
column2:"2"

I have a single worker and I (maybe) depending on the amount of data sometimes the conversion works without problem, sometime it adds a random "column28".

Is there some additional synchronizing to be done when handling many large files?

Badger · September 28, 2020, 1:42pm

autogenerate_column_names is true by default. If the header row has 27 entries and a later row has 28 then that is what it will be called.

Which version of logstash are you running?

Christoph1 · September 28, 2020, 2:17pm

Thank you for your reply.
I am aware of the auto generation feature.
I am also using autodetect names
The columns match and the pipeline succeeds sometimes, hence my confusion.

I am using the latest version

Mit freundlichen Grüßen

Dr.-Ing. Christoph Weber

IT-Consultant

ASTRUM IT GmbH

Am Wolfsmantel 2
D-91058 Erlangen
Tel.: +49 91 31 94 08-309

www.astrum-it.de

Geschäftsführer:
Gerhard Pölz
Walter Greul
Amtsgericht Fürth, HRB 6549
UST.-Id.: DE186320885

Badger · September 28, 2020, 3:11pm

The only thing I can think of is that pipeline.ordered is set to false, so that sometimes it picks up a data row as a header.

Christoph1 · September 28, 2020, 4:26pm

Unfortunately the pipeline.workers are already set to 1.

Ordering is then set by default.

system · October 26, 2020, 4:26pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
CSV: autodetect_column_names vs. autogenerate_column_names Logstash	3	4389	February 12, 2019
Mutate and Columns combined Logstash	3	892	July 6, 2017
Logstash fails to auto generate column names from CSV file Logstash	4	1884	July 9, 2019
Logstash csv check fields for numeric and mutate to integer Logstash	4	3664	July 6, 2017
Logstash + Elasticsearch Boolean data 질문입니다 Logstash	7	464	July 24, 2019

Converting more than one column destroys data

Related topics