Csv filter output splitting a row in two because of some special char

stillfreem · November 9, 2021, 5:54am

I have a csv file with 7543 entries that I'd like to get to cloud SIEM.

My config file is as follows

input           {
                        file
                                {
                                        path => "/home/<user>/myfile.csv"
                                        start_position => beginning
                                        #sincedb_path => ""
                                }
                }

filter          {
                        csv
                                {
                                        autodetect_column_names => true
                                        separator => ","
                                        skip_empty_columns => false
                                }
                }

output  {
       microsoft-logstash-output-azure-loganalytics     {
            workspace_id => "MyID"
            workspace_key => "Mykey"
            custom_log_table_name => "<Mytablename>"
            key_names => ['name'. 'of', 'my', 'columns']
                                                        }
        stdout{}
        }

The configuration is working perfectly but I have a couple of problems.

Not all entries were sent to the SIEM. I run it several times, renaming the file and the first time 4000 something were sent, second time 6000 something was sent, but not all 7543.
Second problem is that I had some errors popping out on the standard output and it seems that the Logstash split several entries on two because of what I believe is a char that needs to be escaped or something like that. Check the below entry and pay closer attention on the bolt part

33746,v33746,dali163,IBM-MF-DALI,1350,SuSe,Suse Linux Enterprise Server 12.2,Linux,20,0,2,,zLINUX,RD-Test Server,SIMPANA (30d),ACTIVE,none,10.20.77.163,unknown,null,IF-MF-VM,hkaf,"dow, John",1011310020,"SLES 11_x000d_
Adabas test server",null,thsc,4/28/2014 16:53,EUR\bas,11/4/2021 5:30,UNIX Container Agent,,null,Germany,DAE,Darmstadt (Germany; DAE),V9,IBM z15,15,6,10,11,33,1,22,59963,5205,57,57,11/4/2021 1:36,0.04,null,null,11/4/2021 4:30,11/4/2021 5:30,0.04,,,,,0,,,null,null,

The console output

The file that Logstash reads can sometimes change some of the values in the same entries, meaning it's not adding new rows but just updating some of the values in the old ones. I did a test, changing one value and Logstash didn't recognize this. The way sincedb works is that its just waiting for new rows but how about old entries with changed values, is there a way I tell Logstash to watch for this too?

Thank you all in advance

stillfreem · November 9, 2021, 6:27am

I found what the problem is regarding question 2.
There was a new line as a value in the Comment column

SLES 11_x000d_
Adabas test server

Once I remove the new line it worked out for me
SLES 11_x000d_Adabas test server

Then I wrote the following piece of code and it worked. All entries were ingested into my SIEM without csv parse errors

codec => multiline {
                                                                pattern => "^[0-9]"
                                                                negate => true
                                                                what => "previous"
                                                           }

Only Q3 remains unanswered for now

Badger · November 9, 2021, 5:18pm

No, in "tail" mode the file input assumes all new data is appended to the end of the file. It will not re-read data it has already read.

stillfreem · November 9, 2021, 6:00pm

Thank you @Badger, so is it possible at all, then to Tell Logstash to watch for mods on already read data?

Badger · November 9, 2021, 6:04pm

I cannot think of a way to do that.

stillfreem · November 9, 2021, 6:06pm

Thank you anyway @Badger

stillfreem · November 10, 2021, 1:01pm

Hi again @Badger and everyone,
Some of my entries contain German characters and this is breaking my parser.

Check this out

[WARN ] 2021-11-10 13:42:57.718 [[main]<file] plain - Received an event that has a different character encoding than you configured. {:text=>"xxx,Windows,ACTIVE,10.21.36.164,xxx,Gr\\xE4der, XXX,\\r", :expected_charset=>"UTF-8"}

For instance in the above entry the name is is not Gr\\xE4der and this char \\xE4 seems to be in German. How can I tell Logstash to look for UTF-8 and German chars?

Badger · November 10, 2021, 2:29pm

Specify the charset on the input...

file {
    codec => plain { charset => "someValue" }
    ....

If your text contains Gr\\xE4der for Gräder then it is not UTF-8. It could be ISO 8859, CP-1252 or even some other encoding.

stillfreem · November 10, 2021, 2:59pm

Yeap ISO 8859 did the trick, many thanks

system · December 8, 2021, 3:00pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logstash csv filter unable to handle newline in CSV Logstash	5	85	September 10, 2024
CSV filter duplicate output result Logstash	7	832	February 24, 2020
Message field split into multiple entries from CSV input Logstash	5	435	June 14, 2019
Logtash CSV Parser Fail Muline Value is counting next raw Logstash	11	238	December 9, 2022
CSV Reader using LogStash. Unable to Read CSV Logstash	5	470	April 1, 2020

Csv filter output splitting a row in two because of some special char

Related topics