Logstash filter to split field into fields of arrays

Hi,

I'm struggling to process some logs in Logstash. So far I managed to convert them to something like this:

"sessions": [
  "http: active                                            0",
  "http: started                                           1",
  "http: successful                                        2",
  "telnet: active                                          0",
  "telnet: started                                         0",
  "telnet: successful                                      0",
  "telnet: failed                                          0",
  "http: error: timeout                          		   0",
  "http: error: Error-500                    			   0",
  "http: error: Error-400                    			   0"
]

I want to further divide and group them to receive output as below on which I can use the data to create some charts based on values kept in the fields.

"sessions": [
  "http": [
    "active": 0,
    "started": 1,
    "successful": 2,
    "error": [
      "timeout": 0,
      "httpError-500": 0,
      "httpError-400": 0
    ]
  ]
  "telnet": [
    "active": 0,
    "started": 0,
    "successful": 0,
    "failed": 0
  ]

I would appreciate any help.
Is it even possible to do that?

Off the top of my head, this could be done with the [logstash-filter-ruby][] plugin, which provides a framework for executing ruby code, but in my experience whenever we have to "drop down" to an executable programming language, we're probably not on the right path.

It may be easier to get to your destination by taking a step back; how did you get to the sessions array? What is the shape of your inbound data?

I also think that the best solution is the simplest one so I would like to avoid ruby filter if possible.
My log file looks like this (I'm interested only in the first count column) :

12:01:56.017
                                                      count            x              y              z
http: active                                            0
http: started                                           1
http: successful                                       30              5              0            119
http: error: httpError-500                              0              0              0              0
http: error: httpError-400                              0              0              0              0
http: error: timeout                                    0              0              0              0
                                                      count            x              y              z
telnet: active                                          0
telnet: started                                         0
telnet: successful                                      0              0              0              0
telnet: failed                                          0              0              0              0

Oy. that's a pretty rough log format. It has just enough structure to make it look like it should be easy, but enough variance to throw a wrench into just about any of the generic parsing tools.

If there are only ever going to be these keys, the best way may be a series of groks (note how they're all anchored to beginning-of-line with ^; this will greatly improve performance):

filter {
  grok { match => { "message" => "^http: active%{SPACE}%{NUMBER:[http][active]:int}"} }
  grok { match => { "message" => "^http: started%{SPACE}%{NUMBER:[http][started]:int}"} }
  grok { match => { "message" => "^http: successful%{SPACE}%{NUMBER:[http][successful]:int}"} }
  grok { match => { "message" => "^http: error: httpError-500%{SPACE}%{NUMBER:[http][error][500]:int}"} }
  grok { match => { "message" => "^http: error: httpError-400%{SPACE}%{NUMBER:[http][error][400]:int}"} }
  grok { match => { "message" => "^http: error: timeout%{SPACE}%{NUMBER:[http][error][timeout]:int}"} }
  grok { match => { "message" => "^telnet: active%{SPACE}%{NUMBER:[telnet][active]:int}"} }
  grok { match => { "message" => "^telnet: started%{SPACE}%{NUMBER:[telnet][started]:int}"} }
  grok { match => { "message" => "^telnet: successful%{SPACE}%{NUMBER:[telnet][successful]:int}"} }
  grok { match => { "message" => "^telnet: failed%{SPACE}%{NUMBER:[telnet][failed]:int}"} }
}

Yes, it's a nasty log file :slight_smile: Fortunately it's also static with only values being changed so I think I can do it your way. I wouldn't think about using a number of groks instead of one huge one that I started with.
Thanks for your help!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.