My Logstash is not parsing data properly

I have been working on a project to upload VPN logs to a logstash instance that I created, and while I am able to see data being sent to the logstash, I am seeing a "_dateparsefailure" error showing under [tags] in my output. Also, While some of the data is parsing correctly, some of the data is not showing in the correct fields, which tells me that the data, as a whole, is not parsing correctly, and I'm not sure why. Can someone give any input as to why this might be happening?

that simply means something is wrong with your logstash configuration. there are hundreds of plugins in logstash, so without an example, it’s hard to imagine. maybe show your config , with your data, then explain what you expect vs what happens.

_dateparsefailure usually means your using a date filter with a wrong date format

This is the latest version of my config:

input {
                udp {
                        port => 42514
                        tags => [ "vpn" ]
                }
        }
        filter {
                grok {
                        tag_on_failure => [ "parse_failed" ]
                        match => { "message" => '\<%{NUMBER:view}\>%{SYSLOGTIMESTAMP} %{HOSTNAME:fwname} %{GREEDYDATA:raw_message}'}
                }
                csv {
                        source => "random"
                        separator => ","
                        columns => [ "random_number" ]
                }
                csv {
                        source => "raw_message"
                        separator => '"'
                        columns => [ "receive", "description", "device" ]
                }
                csv {
                        source => "receive"
                        separator => ","
                        columns => [ "receive_time", "serial", "type", "subtype", "FUTURE_USE.1", "time_generated", "vsys", "eventid", "object", "module", "severity" ]
                }
                csv {
                        source => "description"
                        separator => ","
                        columns => [ "description" ]
                }
                csv {
                        source => "device"
                        separator => ","
                        columns => [ "number", "seqno", "actionflags", "dg_hier_level_1", "dg_hier_level_2", "dg_hier_level_3", "dg_hier_level_4", "vsys.name", "device_name"  ]
                }
                mutate {
                        remove_field => ["random_number", "serial", "FUTURE_USE.1", "vsys", "FUTURE_USE.2", "FUTURE_USE.3", "actionflags", "dg_hier_level_1", "dg_hier_level_2", "dg_hier_level_3", "dg_hier_level_4", "vsys.name"]
                }
                date {
                        timezone => "America/Los_Angeles"
                        match => [ "receive_time", "YYYY/MM/dd HH:mm:ss"  ]
                }
        }

        output {
                        file {
                                path => "/opt/logstash/debug-%{+YYYY-MM-dd}.json"
                        }
                        stdout { codec => rubydebug }
        }

When I look for just the vpn logs that are being sent to this logstash instance, some of the fields that I see being parsed incorrectly are the receive time, subtype, time generated, type, and receive.

There might be other issues, but that is what I see right away that is being parsed incorrectly.

is there a sample data for that config so we can compare the filter config with the data you receive? ? you can omit sensitive information as long as the structure and formatting is intact

{
  "view": "14",
  "fwname": "firewall name",
  "column12": "0",
  "column15": null,
  "receive": "1,2020/05/06 10:59:06,001801026594,SYSTEM,"gateway,0,2020/05/                                                                                        06 10:59:06,,"gateway"-config-release,RA-VPN-GW-Site-N,0,0,                                                                                        general,informational,",
  "description": "gateway client configuration released. User name                                                                                        : "user's email address", Private IP: "IP_Address", Client version: 5.0.8-4, Device name:                                                                                         "Computer_Name", Client OS version: Microsoft Windows 10 Enterprise , 64-bit, VPN t                                                                                        ype: Device Level VPN.",
  "`raw_message`": "1,2020/05/06 10:59:06,001801026594,SYSTEM,"gateway",0,2020                                                                                        /05/06 10:59:06,,gateway-config-release,RA-VPN-GW-Site-N,                                                                                        0,0,general,informational,\"gateway client configuration released.                                                                                         User name: "user's email address", Private IP: "IP Address", Client version: 5.0.8-4, De                                                                                        vice name: "Computer Name", Client OS version: Microsoft Windows 10 Enterprise , 64-                                                                                        bit, VPN type: Device Level VPN.\",26694772,0x0,0,0,0,0,,fw_name",
  "receive_time": "1",
  "device_name": "firewall name",
  "severity": "0",
  "column14": "informational",
  "device": ",26694772,0x0,0,0,0,0,,firewall name",
  "number": null,
  "eventid": null,
  "time_generated": "0",
  "module": "RA-VPN-GW-Site-N",
  "host": "IP Address",
  "message": "<14>May  6 10:59:06 "firewall name" 1,2020/05/06 10:59:06,                                                                                        001801026594,SYSTEM,"gateway",0,2020/05/06 10:59:06,,gateway-co                                                                                        nfig-release,RA-VPN-GW-"Site"-N,0,0,general,informational,\"                                                                                     gateway client configuration released. User name: "user's email", Private IP:                                                                                         "IP Address", Client version: 5.0.8-4, Device name: "Computer Name", Client OS versio                                                                                        n: Microsoft Windows 10 Enterprise , 64-bit, VPN type: Device Level VPN.\",26694                                                                                        772,0x0,0,0,0,0,,firewall name",
  "tags": [
    "vpn",
    "_dateparsefailure"
  ],
  "subtype": "SYSTEM",
  "seqno": "26694772",
  "column13": "general",
  "`@version`": "1",
  "`@timestamp`": "2020-05-06T17:59:06.209Z",
  "type": "001801026594",
  "object": "gateway-config-release"
}

Here is an example of the output that I see.

your raw_message contains number "1". it will be parsed to "receive" field and the next parsing

Yeah, I have been working on getting rid of that "1", but I haven't found a solution to that. I created a csv filter to remove the "1" with "source => random", and send that field to my "remove_field" mutate filter, but it didn't remove the "1". Do you have any suggestions?

I was able to fix the issue the "_dateparsefailure" error. It was a formatting issue. Now, I'm looking at why some of my fields are named incorrectly, or aren't being removed like they should be.

if the “1” appears consistently and is decimal, you can remove it using \d in your first grok filter or store it with %{NUMBER:some_field}

like this?: match => { "message" => '<\d%{NUMBER:view}>%{SYSLOGTIMESTAMP} %{HOSTNAME:fwname} %{GREEDYDATA:raw_message}'}

match => {"message", "<%{NUMBER:view}>%{SYSLOGTIMESTAMP:timestamp} \"%{DATA:fwname}\" %{NUMBER:uniq},%{GREEDYDATA:message}"

that should give you :

{
  "view": "14",
  "fwname": "firewall name",
  "num": "1",
  "message": "2020/05/06 10:59:06, 001801026594,SYSTEM,\"gateway\",0,2020/05/06 10:59:06,,gateway-co nfig-release,RA-VPN-GW-\"Site\"-N,0,0,general,informational,\\\" gateway client configuration released. User name: \"user's email\", Private IP: \"IP Address\", Client version: 5.0.8-4, Device name: \"Computer Name\", Client OS versio n: Microsoft Windows 10 Enterprise , 64-bit, VPN type: Device Level VPN.\\\",26694 772,0x0,0,0,0,0,,firewall name\"",
  "timestamp": "May 6 10:59:06"
}

I made that change in my config, but now I'm not seeing a log being generated when I run my logstash.

I am seeing logs again. I'm not sure what caused the issue, but its good again.

you should then be able to send the message to csv filter and do what you want to do with it

I am seeing errors in the logs that I wasn't seeing before:
[2020-05-12T13:50:42,926][ERROR][logstash.agent ] Failed to execute action {:action=>LogStash::PipelineAction::Create/pipeline_id:main, :exception=>"LogStash::ConfigurationError", :message=>"Expected one of #, => at line 10, column 36 (byte 164) after filter { \n\tgrok {\n tag_on_failure => [ \"parse_failed\" ]\n \t match => {\"message\"", :backtrace=>["/usr/share/logstash/logstash-core/lib/logstash/compiler.rb:41:in `compile_imperative'", "/usr/share/logstash/logstash-core/lib/logstash/compiler.rb:49:in `compile_graph'", "/usr/share/logstash/logstash-core/lib/logstash/compiler.rb:11:in `block in compile_sources'", "org/jruby/RubyArray.java:2577:in `map'", "/usr/share/logstash/logstash-core/lib/logstash/compiler.rb:10:in `compile_sources'", "org/logstash/execution/AbstractPipelineExt.java:151:in `initialize'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:22:in `initialize'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:90:in `initialize'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline_action/create.rb:43:in `block in execute'", "/usr/share/logstash/logstash-core/lib/logstash/agent.rb:96:in `block in exclusive'", "org/jruby/ext/thread/Mutex.java:165:in `synchronize'", "/usr/share/logstash/logstash-core/lib/logstash/agent.rb:96:in `exclusive'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline_action/create.rb:39:in `execute'", "/usr/share/logstash/logstash-core/lib/logstash/agent.rb:334:in `block in converge_state'"]}

I have confirmed that the errors were caused by the match statement that we have been looking at. For some reason, making that change to my match statement in my grok causes my entire logstash instance to crash.

Now that I am seeing logs being generated again, I am looking at an issue where some of the fields are being incorrectly named: "column13": "general", "column15": null, etc. Is there a way to fix this?

you will need to adjust the naming of the columns. apologies, there was syntax error in my previous config. after trying your message, I ended up with this result. your message has 28 columns :

{
        "column2" => " 001801026594",
       "column11" => "0",
       "column16" => " Client version: 5.0.8-4",
        "column4" => "gateway",
     "@timestamp" => 2020-05-13T02:15:04.680Z,
       "column12" => "general",
       "column14" => " gateway client configuration released. User name: user's email",
        "column7" => nil,
        "column5" => "0",
       "column13" => "informational",
       "column26" => "0",
        "column3" => "SYSTEM",
           "uniq" => "1",
        "column6" => "2020/05/06 10:59:06",
       "@version" => "1",
        "column9" => "RA-VPN-GW-Site-N",
       "column27" => nil,
       "column10" => "0",
       "column21" => "26694 772",
       "column22" => "0x0",
         "fwname" => "firewall name",
       "column24" => "0",
        "column1" => "2020/05/06 10:59:06",
       "column18" => " Client OS versio n: Microsoft Windows 10 Enterprise ",
       "column17" => " Device name: Computer Name",
        "column8" => "gateway-co nfig-release",
       "column19" => " 64-bit",
           "date" => "May 6 10:59:06",
    "raw-message" => "2020/05/06 10:59:06, 001801026594,SYSTEM,gateway,0,2020/05/06 10:59:06,,gateway-co nfig-release,RA-VPN-GW-Site-N,0,0,general,informational, gateway client configuration released. User name: user's email, Private IP: IP Address, Client version: 5.0.8-4, Device name: Computer Name, Client OS versio n: Microsoft Windows 10 Enterprise , 64-bit, VPN type: Device Level VPN.,26694 772,0x0,0,0,0,0,,firewall name",
       "column15" => " Private IP: IP Address",
        "message" => "<14>May 6 10:59:06 \"firewall name\" 1,2020/05/06 10:59:06, 001801026594,SYSTEM,\"gateway\",0,2020/05/06 10:59:06,,gateway-co nfig-release,RA-VPN-GW-\"Site\"-N,0,0,general,informational,\" gateway client configuration released. User name: \"user's email\", Private IP: \"IP Address\", Client version: 5.0.8-4, Device name: \"Computer Name\", Client OS versio n: Microsoft Windows 10 Enterprise , 64-bit, VPN type: Device Level VPN.\",26694 772,0x0,0,0,0,0,,firewall name",
       "column25" => "0",
       "column20" => " VPN type: Device Level VPN.",
       "column23" => "0",
       "column28" => "firewall name"
}

here's my config. note that i added the gsub because your sample message contains quotes

filter {

	grok {
		match => { 'message' => '%{SYSLOGTIMESTAMP:date} "%{DATA:fwname}" %{NUMBER:uniq},%{GREEDYDATA:raw-message}' }
	}

	mutate {
		gsub => ["raw-message", "\"", ""]
	}

	csv {
		source => "raw-message"
		separator => ","


	}
}
``

I got an error message when adding this to my grok: 
```[2020-05-13T13:19:00,157][ERROR][logstash.agent           ] Failed to execute action {:action=>LogStash::PipelineAction::Create/pipeline_id:main, :exception=>"LogStash::ConfigurationError", :message=>"Expected one of #, {, ,, ] at line 29, column 36 (byte 1161) after filter { \n\tgrok {\n                tag_on_failure => [ \"parse_failed\" ]\n                match => { 'message' => '%{SYSLOGTIMESTAMP:date} \"%{DATA:fwname}\" %{NUMBER:uniq},%{GREEDYDATA:raw-message}' }\n\t}\n\tcsv {\n\t\tsource => \"raw_message\"\n\t        separator => '\"'\n                columns => [ \"receive\", \"description\", \"device\" ]\n\t}\n\tcsv { \n\t\tsource => \"receive\"\n                separator => \",\"                       \n\t\tcolumns => [ \"temp_field\", \"receive_time\", \"serial\", \"type\", \"subtype\", \"FUTURE_USE_1\", \"time_generated\", \"vsys\", \"eventid\", \"object\", \"module\", \"severity\" ]\n\t}\n\tcsv {\n                 source => \"device\"\n                 separator => \",\"\n                 columns => [ \"number\", \"seqno\", \"actionflags\", \"dg_hier_level_1\", \"dg_hier_level_2\", \"dg_hier_level_3\", \"dg_hier_level_4\", \"vsys_name\", \"device_name\"  ]\n         }\n\t mutate {\n                   remove_field => [ \"temp_field\", \"serial\", \"FUTURE_USE_1\", \"vsys\", \"FUTURE_USE_2\", \"FUTURE_USE_3\", \"actionflags\", \"dg_hier_level_1\", \"dg_hier_level_2\", \"dg_hier_level_3\", \"dg_hier_level_4\", \"vsys_name\" ]\n\t\t   gsub => [\"raw-message\", \"\\\", \"", :backtrace=>["/usr/share/logstash/logstash-core/lib/logstash/compiler.rb:41:in `compile_imperative'", "/usr/share/logstash/logstash-core/lib/logstash/compiler.rb:49:in `compile_graph'", "/usr/share/logstash/logstash-core/lib/logstash/compiler.rb:11:in `block in compile_sources'", "org/jruby/RubyArray.java:2577:in `map'", "/usr/share/logstash/logstash-core/lib/logstash/compiler.rb:10:in `compile_sources'", "org/logstash/execution/AbstractPipelineExt.java:151:in `initialize'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:22:in `initialize'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:90:in `initialize'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline_action/create.rb:43:in `block in execute'", "/usr/share/logstash/logstash-core/lib/logstash/agent.rb:96:in `block in exclusive'", "org/jruby/ext/thread/Mutex.java:165:in `synchronize'", "/usr/share/logstash/logstash-core/lib/logstash/agent.rb:96:in `exclusive'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline_action/create.rb:39:in `execute'", "/usr/share/logstash/logstash-core/lib/logstash/agent.rb:334:in `block in converge_state'"]}
[2020-05-13T13:19:00,480][INFO ][logstash.agent           ] Successfully started Logstash API endpoint {:port=>9600}
[2020-05-13T13:19:05,354][INFO ][logstash.runner          ] Logstash shut down.

It looks like you might have missed some commas. I'm going to add them in, and see what happens.