Can the "field" property be used to extract a value from the message body, and why not?

(discuss03) #1

I have been trying to figure out how to extract from an apache/nginx log with combined format the fields for: client IP, agent-type, HTTP error code, etc.

I have noticed that Filebeat has regex support for multiline recoginizion, but I am at a lost as to why the field property does not have a pattern option where I can place a regex extractor to specify the field value to be extracted rather than a hard coded value.

Take for instance this entry:

Mar 25 17:35:48 slb-99-000 slb: - - [25/Mar/2016:17:35:48 -0400] "GET /service/program?id=10584&format=json HTTP/1.1" 200 1363 "-" "Python-urllib/2.7"

From the message above I would like to extract the IP address of the client, the HTTP return code, the byte size, and the agent-user.

- field
- name: client_ip
- pattern: .+ .+ .+ .+ .+ (.+)
- field
- name: http_code
- pattern: .* HTTP.+ (.+)

 I wouldn't care for grok - just plain regex would do the job in this case.

This message is forwarded from NGINX to syslog-ng (syslog's integration to elastic search is rather poor - hence choosing filebeat to forward to ElasticSearch).

Any ideas how to go about it without using logstash?

(Christian Dahlqvist) #2

Parsing messages is currently not possible in Filebeat as it is designed to be lightweight. The type of processing that you are describing is exactly what Logstash was designed for. This is at the moment the recommended way to do what you are looking for.

In Elasticsearch 5.0, the concept of an ingest node type will be introduced, and this will allow some parsing and formatting directly in Elasticsearch before the data is indexed, allowing Filebeat to send data directly to Elasticsearch as long as the processing requirements are met by the capabilities of the ingest node.

(system) #3