Csv filter output splitting a row in two because of some special char

I have a csv file with 7543 entries that I'd like to get to cloud SIEM.

My config file is as follows

input           {
                        file
                                {
                                        path => "/home/<user>/myfile.csv"
                                        start_position => beginning
                                        #sincedb_path => ""
                                }
                }

filter          {
                        csv
                                {
                                        autodetect_column_names => true
                                        separator => ","
                                        skip_empty_columns => false
                                }
                }

output  {
       microsoft-logstash-output-azure-loganalytics     {
            workspace_id => "MyID"
            workspace_key => "Mykey"
            custom_log_table_name => "<Mytablename>"
            key_names => ['name'. 'of', 'my', 'columns']
                                                        }
        stdout{}
        }

The configuration is working perfectly but I have a couple of problems.

  1. Not all entries were sent to the SIEM. I run it several times, renaming the file and the first time 4000 something were sent, second time 6000 something was sent, but not all 7543.

  2. Second problem is that I had some errors popping out on the standard output and it seems that the Logstash split several entries on two because of what I believe is a char that needs to be escaped or something like that. Check the below entry and pay closer attention on the bolt part

    33746,v33746,dali163,IBM-MF-DALI,1350,SuSe,Suse Linux Enterprise Server 12.2,Linux,20,0,2,,zLINUX,RD-Test Server,SIMPANA (30d),ACTIVE,none,10.20.77.163,unknown,null,IF-MF-VM,hkaf,"dow, John",1011310020,"SLES 11_x000d_
    Adabas test server",null,thsc,4/28/2014 16:53,EUR\bas,11/4/2021 5:30,UNIX Container Agent,,null,Germany,DAE,Darmstadt (Germany; DAE),V9,IBM z15,15,6,10,11,33,1,22,59963,5205,57,57,11/4/2021 1:36,0.04,null,null,11/4/2021 4:30,11/4/2021 5:30,0.04,,,,,0,,,null,null,

The console output

  1. The file that Logstash reads can sometimes change some of the values in the same entries, meaning it's not adding new rows but just updating some of the values in the old ones. I did a test, changing one value and Logstash didn't recognize this. The way sincedb works is that its just waiting for new rows but how about old entries with changed values, is there a way I tell Logstash to watch for this too?

Thank you all in advance :slight_smile:

I found what the problem is regarding question 2.
There was a new line as a value in the Comment column

SLES 11_x000d_
Adabas test server

Once I remove the new line it worked out for me :slight_smile:
SLES 11_x000d_Adabas test server
image

Then I wrote the following piece of code and it worked. All entries were ingested into my SIEM without csv parse errors :slight_smile:

codec => multiline {
                                                                pattern => "^[0-9]"
                                                                negate => true
                                                                what => "previous"
                                                           }

Only Q3 remains unanswered for now :slight_smile:

No, in "tail" mode the file input assumes all new data is appended to the end of the file. It will not re-read data it has already read.

1 Like

Thank you @Badger, so is it possible at all, then to Tell Logstash to watch for mods on already read data?

I cannot think of a way to do that.

1 Like

Thank you anyway @Badger

Hi again @Badger and everyone,
Some of my entries contain German characters and this is breaking my parser.

Check this out

[WARN ] 2021-11-10 13:42:57.718 [[main]<file] plain - Received an event that has a different character encoding than you configured. {:text=>"xxx,Windows,ACTIVE,10.21.36.164,xxx,Gr\\xE4der, XXX,\\r", :expected_charset=>"UTF-8"}

For instance in the above entry the name is is not Gr\\xE4der and this char \\xE4 seems to be in German. How can I tell Logstash to look for UTF-8 and German chars?

Specify the charset on the input...

file {
    codec => plain { charset => "someValue" }
    ....

If your text contains Gr\\xE4der for Gräder then it is not UTF-8. It could be ISO 8859, CP-1252 or even some other encoding.

Yeap ISO 8859 did the trick, many thanks :slight_smile:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.