MalformedCSVError: Illegal quoting in line 1

Hi all, I am seeing WARNING messages when csv is pushed to logstash, I want to eliminate the WARNING messages, any help appreciated:

Sample csv:


20185656,2021-02-01 17:52:47,2021-02-01 18:23:15,"Hi Test2, this is check.com/L",ee53b46d906232b0f925b456,,"",+12345445,tail,44a87564-a85f-4aa8-9e82f,Unclassified,"None","None",""

conf file:

input {
  file {
    path => "/var/lib/logstash/test/*.csv"
    start_position => "beginning"
    add_field => { "type" => "test" }
  }
}

filter {
  csv {
    columns => ["id","time","updatetime","text","another","number","number2","number3","source","number4","class","check","check1","check2"]
    separator => ","
  }
  date {
     match => ["reporttime","yyyy-MM-dd HH:mm:ss"]
  }
}

output {
    elasticsearch {
      hosts => ["XXX.XXX.XXX.X:9200"]
      index => "test1"
    }
}

Error seen while tailing logstash-plain.log:

[2021-02-11T19:31:04,910][INFO ][logstash.agent           ] Successfully started Logstash API endpoint {:port=>9600}
[2021-02-11T19:32:05,287][WARN ][logstash.filters.csv     ][main][48514e46dc7d34714c27b3b371a9ed647d2e2aee6edc72dc7b32e3364adc1822] Error parsing csv {:field=>"message", :source=>"20185656,2021-02-01 17:52:47,2021-02-01 18:23:15,\"Hi Test2, this is check.com/L\",ee53b46d906232b0f925b456,,\"\",+12345445,tail,44a87564-a85f-4aa8-9e82f,Unclassified,\"None\",\"None\",\"\"", :exception=>#<CSV::MalformedCSVError: Illegal quoting in line 1.>}
[2021-02-11T19:32:05,301][WARN ][logstash.filters.csv     ][main][aaf4724aa89d5ab109e193a34dd6d144e9f4b45a268932f3c90bb3e70deede4a] Error parsing csv {:field=>"message", :source=>"20185656,2021-02-01 17:52:47,2021-02-01 18:23:15,\"Hi Test2, this is check.com/L\",ee53b46d906232b0f925b456,,\"\",+12345445,tail,44a87564-a85f-4aa8-9e82f,Unclassified,\"None\",\"None\",\"\"", :exception=>#<CSV::MalformedCSVError: Illegal quoting in line 1.>}

I have tested removing all the quotations from the message and with this no _csvparsefailure warning is seen in tags. However I want to be able to keep the quotations as is, in fields they are present in and see no _csvparsefailure message. Is this possible?
In my case there does not appear to be any encoding error or issue with quote_char as I have seen in other posts.

Very strange. When I run

input { generator { count => 1 lines => [ '20185656,2021-02-01 17:52:47,2021-02-01 18:23:15,"Hi Test2, this is check.com/L",ee53b46d906232b0f925b456,,"",+12345445,tail,44a87564-a85f-4aa8-9e82f,Unclassified,"None","None",""' ] } }
filter {
      csv { columns => ["id","time","updatetime","text","another","number","number2","number3","source","number4","class","check","check1","check2"] separator => "," }
}

I get

      "text" => "Hi Test2, this is check.com/L",
      "time" => "2021-02-01 17:52:47",
"updatetime" => "2021-02-01 18:23:15",
     "class" => "Unclassified"

etc. Non-printing characters in the source file? Maybe od -ha the source file and see if there is something in there that you do not expect.

Hi Badger,

Thanks for response. Seems there is an extra \n newline value at end of file.

 od -c check38.csv
0000000   2   0   1   8   5   6   5   6   ,   2   0   2   1   -   0   2
0000020   -   0   1       1   7   :   5   2   :   4   7   ,   2   0   2
0000040   1   -   0   2   -   0   1       1   8   :   2   3   :   1   5
0000060   ,   "   H   i       T   e   s   t   2   ,       t   h   i   s
0000100       i   s       c   h   e   c   k   .   c   o   m   /   L   "
0000120   ,   e   e   5   3   b   4   6   d   9   0   6   2   3   2   b
0000140   0   f   9   2   5   b   4   5   6   ,   ,   "   "   ,   +   1
0000160   2   3   4   5   4   4   5   ,   t   a   i   l   ,   4   4   a
0000200   8   7   5   6   4   -   a   8   5   f   -   4   a   a   8   -
0000220   9   e   8   2   f   ,   U   n   c   l   a   s   s   i   f   i
0000240   e   d   ,   "   N   o   n   e   "   ,   "   N   o   n   e   "
0000260   ,   "   "  \n
0000264

od -ha check38.csv

0000000    3032    3831    3635    3635    322c    3230    2d31    3230
          2   0   1   8   5   6   5   6   ,   2   0   2   1   -   0   2
0000020    302d    2031    3731    353a    3a32    3734    322c    3230
          -   0   1  sp   1   7   :   5   2   :   4   7   ,   2   0   2
0000040    2d31    3230    302d    2031    3831    323a    3a33    3531
          1   -   0   2   -   0   1  sp   1   8   :   2   3   :   1   5
0000060    222c    6948    5420    7365    3274    202c    6874    7369
          ,   "   H   i  sp   T   e   s   t   2   ,  sp   t   h   i   s
0000100    6920    2073    6863    6365    2e6b    6f63    2f6d    224c
         sp   i   s  sp   c   h   e   c   k   .   c   o   m   /   L   "
0000120    652c    3565    6233    3634    3964    3630    3332    6232
          ,   e   e   5   3   b   4   6   d   9   0   6   2   3   2   b
0000140    6630    3239    6235    3534    2c36    222c    2c22    312b
          0   f   9   2   5   b   4   5   6   ,   ,   "   "   ,   +   1
0000160    3332    3534    3434    2c35    6174    6c69    342c    6134
          2   3   4   5   4   4   5   ,   t   a   i   l   ,   4   4   a
0000200    3738    3635    2d34    3861    6635    342d    6161    2d38
          8   7   5   6   4   -   a   8   5   f   -   4   a   a   8   -
0000220    6539    3238    2c66    6e55    6c63    7361    6973    6966
          9   e   8   2   f   ,   U   n   c   l   a   s   s   i   f   i
0000240    6465    222c    6f4e    656e    2c22    4e22    6e6f    2265
          e   d   ,   "   N   o   n   e   "   ,   "   N   o   n   e   "
0000260    222c    0a22
          ,   "   "  nl
0000264

That looks fine, there should be a newline at the end of the line.

Hi Badger,

I followed your example:

input {
  generator {
    lines => ['20185656,2021-02-01 17:52:47,2021-02-01 18:23:15,"Hi Test2, this is check.com/L",ee53b46d906232b0f925b456,,"",+12345445,tail,44a87564-a85f-4aa8-9e82f,Unclassified,"None","None",""']
    count => 1
  }
}

filter {
  csv {
    columns => ["id","time","updatetime","text","another","number","number2","number3","source","number4","class","check","check1","check2"]
    separator => ","
  }
}
output {
    stdout { codec => rubydebug }
}

Though output looks good as you indicated I still see the Warning messages in logstash-plain.log. Any tips for me to debug these warnings?

[2021-02-12T11:28:17,478][INFO ][logstash.agent           ] Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}
[2021-02-12T11:28:17,761][WARN ][logstash.filters.csv     ][main][ecb10f42f7412d0be0a469c047649034b8fae061c9cffe36cd5242fa3fd195fa] Error parsing csv {:field=>"message", :source=>"20185656,2021-02-01 17:52:47,2021-02-01 18:23:15,\"Hi Test2, this is check.com/L\",ee53b46d906232b0f925b456,,\"\",+12345445,tail,44a87564-a85f-4aa8-9e82f,Unclassified,\"None\",\"None\",\"\"", :exception=>#<CSV::MalformedCSVError: Illegal quoting in line 1.>}
[2021-02-12T11:28:17,782][WARN ][logstash.filters.csv     ][main][b8a2b2d31dfd9ddfdfc093362dcb73011b868f08a348a6dad854fa38ebf488c3] Error parsing csv {:field=>"message", :source=>"20185656,2021-02-01 17:52:47,2021-02-01 18:23:15,\"Hi Test2, this is check.com/L\",ee53b46d906232b0f925b456,,\"\",+12345445,tail,44a87564-a85f-4aa8-9e82f,Unclassified,\"None\",\"None\",\"\"", :exception=>#<CSV::MalformedCSVError: Illegal quoting in line 1.>}
[2021-02-12T11:28:17,800][INFO ][logstash.agent           ] Successfully started Logstash API endpoint {:port=>9600}

Sorry, I cannot think what could be causing that. However, if you get two warnings when you have the count set to 1 on the generator input then it suggests you have another pipeline running.