Custom Fields Value From Path Element

Let's say I have this prospector definitions:

filebeat.prospectors:

# log for `app-a`
- input_type: log
  paths:
    - /progs/app-a/*/tomcat/*/logs/app-a.log
    - /progs/app-a/*/tomcat/*/logs/catalina.out
  tags: ["tomcat"]
  fields:
    source_program: app-a
  multiline.pattern: "^[[:digit:]]+|^$"
  multiline.negate: true
  multiline.match: after

# log for `app-b`
- input_type: log
  paths:
    - /progs/app-b/*/tomcat/*/logs/app-b.log
    - /progs/app-b/*/tomcat/*/logs/catalina.out
  tags: ["tomcat"]
  fields:
    source_program: app-b
  multiline.pattern: "^[[:digit:]]+|^$"
  multiline.negate: true
  multiline.match: after

As you can see I basically add a source_program field with the name of the application.

Our applications deployment is automated and the deployment path follows this convention:

/progs/{APPLICATION_NAME}/{DEPLOYMENT_VERSION}/tomcat/{TOMCAT_VERSION}/

My question is, could I use some sort automated value for fields using elements of path, like named-group in regular expression?

For example could I do something like this?

- input_type: log
  paths:
    - /progs/app-b/*/tomcat/*/logs/app-b.log
    - /progs/app-b/*/tomcat/*/logs/catalina.out
  tags: ["tomcat"]
  fields:
    source_program: `path.element[1]`
    deploy_version: `path.element[2]`
    tomcat_version: `path.element[4]`
  multiline.pattern: "^[[:digit:]]+|^$"
  multiline.negate: true
  multiline.match: after

Is it possible to achieve this using only Filebeat and Elastisearch? ie. without using logstash.

Thank you.

1 Like

have you had a look at elasticsearch ingest node?

Thank you for your reply.

I'm not familiar with Elasticsearch Ingest Node, so far from browsing the documentation it seems what I need is the Append Processor. I am not familiar with how to split field value by path element separator, and then get the individual parts.
It seems I also have to do a little bit logic to exclude other data from this treatment, we have other logs which does not follow the path convention (which is why use tags like "webserver", "tomcat", etc.), and I'm not sure how to do that with this "pipeline" thing.

A small example would be nice, thank you.

Anyway, if it is too convoluted we might have to bite the bullet and run logstash (something that I want to avoid).

you can make the pipeline optional. See pipeline and pipelines. If no pipeline is empty, no pipeline will be selected.

e.g.

filebeat.prospectors:
- paths: ["/progs/app-b/*/tomcat/*/logs/app-b.log"]
  fields.pipeline: "analyze_source"
  ..
- paths: ["/progs/app-b/tomcat/non-standard/logs/app-b.log"]
  ...

output.elasticsearch:
  ...
  pipeline: '%{[fields.pipeline]}'

With this configuration, all events from first prospector will be send to the analyze_source pipeline and all events from second prospector will not be send to any ingest pipeline.

in elasticsearch you can try the grok filter. It's basically regular expressions on steroids, supporting 'templates', extracing fields, and converting strings to (e.g. numeric) types. The grok filter even supports multiple patterns, in case you have different schemas. Ingest node also has some 'failure' handling in case of content being unparseable by grok. You can use this one as well and always send all events to the pipeline. I recommend to test in the kibana console using the simulate API. If you really need something custom, you can use the script processor with painless([1], [2], [3]).

I don't see how Append processor is what you need. If grok is not doing the trick, you can try the split processor + script to assign the individual fields (on the other hand I think painless let's you use some JAVA API, that is you can use split) .

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.