I am trying to parse a file delimited by @ characters. I am using a CSV filter, and it works properly for every field except the first field. The first field contains the entire message, including the delimiter.
curl -XGET 'http://10.42.42.242:9200/cld-logstash-2015.09.29/_search?q=message_ID="A7/4E-18701-275E8065"' | python -m json.tool ... "hits": [ { "_id": "AVAaod7y8PfMTcP3lBp_", "_index": "cld-logstash-2015.09.29", "_score": 0.37731525, "_source": { "@timestamp": "2015-09-29T19:43:45.274Z", "@version": "1", "Result": "R", **"Unix_timestamp": "1443423602@A7/4E-18701-275E8065@A7/4E-18701-275E8065@2F/AC-18701-275E8065@R@XXXXXXXX@disney.com@WDPRO-DailyReports@email.disney.com@10.198.64.156@1872@esmtp@default@default",** "batch_ID": "A7/4E-18701-275E8065", ... "localpart": "XXXXXXX", "message": [ "1443423602@A7/4E-18701-275E8065@A7/4E-18701-275E8065@2F/AC-18701-275E8065@R@XXXXXXXXX@disney.com@WDPRO-DailyReports@email.disney.com@10.198.64.156@1872@esmtp@default@default" ], "message_ID": "A7/4E-18701-275E8065", "protocol": "esmtp", "remote_IP": "10.198.64.156", ... }, "_type": "logs" } ], ...
My logstash configuration file is:
# logstash configuration file for messagesystems.com server # # input { stdin { } } filter { csv { separator => "@" columns => [ "Unix_timestamp", "message_ID", "batch_ID", "connection_ID", "Result", "f_5", "f_6", "f_7", "f_8", "f_9", "f_10", "f_11", "f_12", "f_13" ] } if [Result] == "R" { mutate { rename => [ "f_5", "localpart", "f_6", "destination_domain", "f_7", "sender_localpart", "f_8", "send_domain", "f_9", "remote_IP", "f_10", "size", "f_11", "protocol", "f_12", "binding_group", "f_13", "binding"] } } if [Result] == "D" or [Result] == "X" { # csv { # columns => [ "Unix_timestamp", "message_ID", "batch_ID", "connection_ID", "Result", "domain", "size", "binding_group", "binding", "retries", "remote_id"] # } mutate { rename => [ "f_5", "domain", "f_6", "size", "f_7", "binding_group", "f_8", "binding", "f_9", "retries", "f_10", "delays", "f_11", "remote_ip"] } } else { # This has not been converted yet, waiting for the D type to be verified and to find a log file with T type records in it csv { columns => [ "Unix_timestamp", "unused_1", "unused_2", "unused_3", "Result" ] } } } ...
All of the fields are working properly except the first one. Importing into elasticsearch works properly, except that the problem with the first field shows up in the elasticsearch queries and kibana, which I guess you'd expect. Garbage in, Garbage out. Adding a "dummy" field throws everything else off by one.
I have discovered another interesting factoid: it isn't consistently wrong. I threw a bigger file at logstash and found:
curl -XGET 'http://10.42.42.242:9200/cld-logstash-2015.09.29/_search' | python -m json.tool | fgrep "Unix_timestamp" "Unix_timestamp": "1443423601@37/4E-18701-175E8065@37/4E-18701-175E8065@EE/AC-18701-175E8065@R@ESPNAPI-DEV@espn.com@nap7-orion@starwave.com@10.198.1.23@1249@esmtp@default@default", "Unix_timestamp": "1443510002", "Unix_timestamp": "1443510002", "Unix_timestamp": "1443510002@BA/2E-19080-2F63A065@BA/2E-19080-2F63A065@56/DA-19080-2F63A065@R@nanopy@nanopy.com@disneymovierewards@disneyhelp.com@10.198.114.51@21442@esmtp@gp_bg@gp_68.71.217.162", "Unix_timestamp": "1443510002", "Unix_timestamp": "1443423601@57/4E-18701-175E8065@47/4E-18701-175E8065@FE/AC-18701-175E8065@R@Studio.content.protection@disney.com@Studio.Content.Protection@disney.com@10.198.83.192@4256@esmtp@default@default", "Unix_timestamp": "1443510002", "Unix_timestamp": "1443423601@67/4E-18701-175E8065@67/4E-18701-175E8065@0F/AC-18701-175E8065@R@XXXXXXXXXX@disney.com@WDPRO-DailyReports@email.disney.com@10.198.64.156@4439@esmtp@default@default", "Unix_timestamp": "1443423602@A7/4E-18701-275E8065@A7/4E-18701-275E8065@2F/AC-18701-275E8065@R@XXXXXXXXXX@disney.com@WDPRO-DailyReports@email.disney.com@10.198.64.156@1872@esmtp@default@default",
Also, the first field is unix time. Is there a way to convert it to a date/time in a filter?
Jeff Silverman
jeffrey.x.silverman.-nd@disney.com