I have been trying to figure out how to extract from an apache/nginx log with combined format the fields for: client IP, agent-type, HTTP error code, etc.
I have noticed that Filebeat has regex support for multiline recoginizion, but I am at a lost as to why the field property does not have a pattern option where I can place a regex extractor to specify the field value to be extracted rather than a hard coded value.
Take for instance this entry:
Mar 25 17:35:48 slb-99-000 slb: 23.215.130.23 - - [25/Mar/2016:17:35:48 -0400] "GET /service/program?id=10584&format=json HTTP/1.1" 200 1363 "-" "Python-urllib/2.7"
From the message above I would like to extract the IP address of the client, the HTTP return code, the byte size, and the agent-user.
-fields
- field
- name: client_ip
- pattern: .+ .+ .+ .+ .+ (.+)
- field
- name: http_code
- pattern: .* HTTP.+ (.+)
...
I wouldn't care for grok - just plain regex would do the job in this case.
This message is forwarded from NGINX to syslog-ng (syslog's integration to elastic search is rather poor - hence choosing filebeat to forward to ElasticSearch).
Any ideas how to go about it without using logstash?