Logstash filter for FTP logs

Hello Team,

I need to put filters for logstash for querying data coming from FTP servers. I havent worked on filters much, just have rough idea, so i created one for some below test logs:

Thu Jul  4 06:01:45 2019 [pid 43249] [xyz] OK DOWNLOAD: Client \"x.x.x.x\", \"/commonupdater/sitestat.xml\", 118 bytes, 2.64Kbyte/sec","popId":"1","hostIpAddress":"x.x.x.x","host":"ftp-1-2","data_field":"raw","type":"ftp-log
Thu Jul  4 06:20:14 2019 [pid 55668] [xyz] OK DOWNLOAD: Client \"x.x.x.x\", \"/commonupdater/sitestat.xml\", 118 bytes, 2.58Kbyte/sec","popId":"2","hostIpAddress":"x.x.x.x","host":"ftp-2-2","data_field":"raw","type":"ftp-log"
Thu Jul  4 06:20:13 2019 [pid 55666] [xyz] OK LOGIN: Client \"x.x.x.x\", anon password \"NcFTP@\"","popId":"3","hostIpAddress":"x.x.x.x","host":"ftp-2-3","data_field":"raw","type":"ftp-log"
Thu Jul  4 06:20:13 2019 [pid 55667] CONNECT: Client \"x.x.x.x\"","popId":"4","hostIpAddress":"x.x.x.x","host":"ftp-1-2","data_field":"raw","type":"ftp-log"
Thu Jul  4 06:20:11 2019 [pid 43201] CONNECT: Client \"x.x.x.x\"","popId":"5","hostIpAddress":"x.x.x.x","host":"ftp-2-4","data_field":"raw","type":"ftp-log"

In these logs line I need following filters, rest can be ignored:

  1. Status: OK DOWNLOAD, FAIL DOWNLOAD, CONNECT
  2. Client IP
  3. File Name: /commonupdater/sitestat.xml, etc.
  4. Size of file: In bytes
  5. Download rate: In bytes/sec

For this I created fiter pattern:

filter {
  grok {
    match => {"message" => "%{MONTH} +%{MONTHDAY} %{TIME} %{YEAR} (\[%{GREEDYDATA:pidno}\] )?(\[%{WORD:comp}\] )?(%{WORD:status} )?(%{WORD:download}:)?(%{WORD:client} )?(\"%{IPV4:ipaddr}\", )?(\"%{GREEDYDATA:filename}\", )?(%{GREEDYDATA:size} )?(%{GREEDYDATA:speed} )?"}
  }
    mutate {
      remove_field => [ "pidno", "comp", "download", "client" ]
    }
}

Thanks in Advance

Do not bother naming fields if you are just going to drop them. Instead of %{GREEDYDATA:pidno} you can just use %{GREEDYDATA}

Personally I would use dissect to take off the prefix to the line that is consistently formatted.

    dissect { mapping => { "message" => "%{[@metadata][ts]} %{+[@metadata][ts]->} %{+[@metadata][ts]} %{+[@metadata][ts]} %{+[@metadata][ts]} [pid %{}] %{[@metadata][restOfLine]}" } }
    grok { match => { "[@metadata][restOfLine]" => '(\[%{WORD}\] )?%{DATA:operation}: Client "%{IPV4:ipaddr}",' } }
    if "DOWNLOAD" in [@metadata][restOfLine] {
        grok { match => { "[@metadata][restOfLine]" => 'Client "%{IPV4}", "(?<filepath>[^"]+)", %{INT:filesize:int} bytes, %{NUMBER:rate:float}Kbyte/sec' } }
    }
    date { match => [ "[@metadata][ts]", "EEE MMM dd HH:mm:ss YYYY" ] }

Thanks Badger,

Is there any docs or some site to refer with and study more about this.

There's and old post about it, but you can take it as a reference (here)

I tried that one, but still same error.

What error? The set of filters I posted works against the set of data you posted.

After setting filter given by you, I cant see any logs coming to elasticsearch even though there are no error messages in logstash logs.