Logstash filter to split field into fields of arrays

mad_dog · March 29, 2018, 7:37pm

Hi,

I'm struggling to process some logs in Logstash. So far I managed to convert them to something like this:

"sessions": [
  "http: active                                            0",
  "http: started                                           1",
  "http: successful                                        2",
  "telnet: active                                          0",
  "telnet: started                                         0",
  "telnet: successful                                      0",
  "telnet: failed                                          0",
  "http: error: timeout                          		   0",
  "http: error: Error-500                    			   0",
  "http: error: Error-400                    			   0"
]

I want to further divide and group them to receive output as below on which I can use the data to create some charts based on values kept in the fields.

"sessions": [
  "http": [
    "active": 0,
    "started": 1,
    "successful": 2,
    "error": [
      "timeout": 0,
      "httpError-500": 0,
      "httpError-400": 0
    ]
  ]
  "telnet": [
    "active": 0,
    "started": 0,
    "successful": 0,
    "failed": 0
  ]

I would appreciate any help.
Is it even possible to do that?

yaauie · March 29, 2018, 8:58pm

Off the top of my head, this could be done with the [logstash-filter-ruby][] plugin, which provides a framework for executing ruby code, but in my experience whenever we have to "drop down" to an executable programming language, we're probably not on the right path.

It may be easier to get to your destination by taking a step back; how did you get to the sessions array? What is the shape of your inbound data?

mad_dog · March 29, 2018, 9:14pm

I also think that the best solution is the simplest one so I would like to avoid ruby filter if possible.
My log file looks like this (I'm interested only in the first count column) :

12:01:56.017
                                                      count            x              y              z
http: active                                            0
http: started                                           1
http: successful                                       30              5              0            119
http: error: httpError-500                              0              0              0              0
http: error: httpError-400                              0              0              0              0
http: error: timeout                                    0              0              0              0
                                                      count            x              y              z
telnet: active                                          0
telnet: started                                         0
telnet: successful                                      0              0              0              0
telnet: failed                                          0              0              0              0

yaauie · April 3, 2018, 5:15pm

Oy. that's a pretty rough log format. It has just enough structure to make it look like it should be easy, but enough variance to throw a wrench into just about any of the generic parsing tools.

If there are only ever going to be these keys, the best way may be a series of groks (note how they're all anchored to beginning-of-line with ^; this will greatly improve performance):

filter {
  grok { match => { "message" => "^http: active%{SPACE}%{NUMBER:[http][active]:int}"} }
  grok { match => { "message" => "^http: started%{SPACE}%{NUMBER:[http][started]:int}"} }
  grok { match => { "message" => "^http: successful%{SPACE}%{NUMBER:[http][successful]:int}"} }
  grok { match => { "message" => "^http: error: httpError-500%{SPACE}%{NUMBER:[http][error][500]:int}"} }
  grok { match => { "message" => "^http: error: httpError-400%{SPACE}%{NUMBER:[http][error][400]:int}"} }
  grok { match => { "message" => "^http: error: timeout%{SPACE}%{NUMBER:[http][error][timeout]:int}"} }
  grok { match => { "message" => "^telnet: active%{SPACE}%{NUMBER:[telnet][active]:int}"} }
  grok { match => { "message" => "^telnet: started%{SPACE}%{NUMBER:[telnet][started]:int}"} }
  grok { match => { "message" => "^telnet: successful%{SPACE}%{NUMBER:[telnet][successful]:int}"} }
  grok { match => { "message" => "^telnet: failed%{SPACE}%{NUMBER:[telnet][failed]:int}"} }
}

mad_dog · April 4, 2018, 7:39am

Yes, it's a nasty log file Fortunately it's also static with only values being changed so I think I can do it your way. I wouldn't think about using a number of groks instead of one huge one that I started with.
Thanks for your help!

system · May 2, 2018, 7:39am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Convert netsted array into fields using Logstash Logstash	3	4726	July 31, 2017
Split array fields in logstash Logstash	22	10540	June 7, 2017
Logstash filter : Split plugin doesn't seems to work Logstash	9	5642	July 6, 2017
How to change Array(list) to field with filter pipeline of Logstash? Logstash ingest-pipeline	2	421	May 25, 2021
How to split nested fields into separate events? Logstash	3	1975	July 6, 2017

Logstash filter to split field into fields of arrays

Related topics