How to enriching events with "dynamic" data from a file

stefws · March 19, 2023, 12:16pm

Tinkering with how to enrich filebeat events by tagging/labelling with data picked from a text that might change infrequently but still change (days, weeks, months). We're talking off application version data, so whenever shipping server log events for an application instance, I would like to add data that is currently stored in a version file per application.

Application redeployments are properly best changed to push/update such version information to some kind of fast lookup DB, rather than read dynamically from application version file for every log event shipped. Don't know if it's even possible to have a processor reading from a file...

Have multiple application instances per server with each their own monitored server logs and multiple application servers. Every application instance have its own version information that should be tagged onto events from its server logs.

All server log events are currently parsed through central ingest pipelines per server log type in the ingesting nodes. Don't know if pipeline would be best to read such data through a DB lookup processor/plugin or it could be performed by filebeat locally per application server.

Currently there's just a single server-wide filebeat shipper per application server, ie. shipping data for all application instances on the server. Another solution could off course be to run a filebeat instance per application instance and then redeploy/start/stop this in parallel with the application instances and then simply deploy version info into env. vars.

Only this would add extra multiple systemd services... even we already have one per application instance...

Ideas appreciated, TIA!

leandrojmp · March 19, 2023, 1:06pm

Can you share an example of the data that you want to enrich?

Since you mentioned that you are using ingest pipelines, you may use an enrich processor to enrich your data.

But depending on what you want to enrich and the frequency, sometimes just a simple set processor may work.

stefws · March 19, 2023, 2:55pm

@leandrojmp I simply want to add an extra field to events that holds the application version string, eg.

app.name: "<App Name>"
app.version: "X.Y.Z"

app.name I currently get from a component in log.file.path by a grok processor in the pipelines, but I would like to also get the app.version, which is stamped by deployment in a text file under the application instance 'home dir' above the server log dir.

leandrojmp · March 19, 2023, 3:14pm

So, the version would be in a file inside the application folder, right?

There is no easy way to do that, Filebeat does not have any enrich functionality, and the enrich processor in the ingest pipeline needs an index with data already populated with the information you want to enrich.

You may add a custom field using environment variables, but you would need to change the value of the variable and restart filebeat everytime there is a version update.

If you can, I would suggest that you add the version in the path and get it using a grok or dissect processor, the same you you get the app.name.

stefws · March 19, 2023, 3:24pm

@leandrojmp Yeah, also thought about adding version to the log file path, only not so nice for our application admins... until they have transferred completely watching/searching logs through kibana

Should it be possible to create an enrich index+processor and then having deployment maintaining the same version information in such (aka fast lookup DB) and then have the pipeline lookup and find current version based on a few incoming doc terms like: server+instance_id?

leandrojmp · March 19, 2023, 4:25pm

You can, this is what enrich processor basically do.

You need an source index, and with this source index you create an enric policy, this policy is then used in the enrich processor.

What you will need to do is find a way to update the source index when you have a new version, and you will also need to execut the enrich policy again since enrich policies does not have built-in way to update the enrich index automatically.

But enrich policy is based on matchs according to one field only, since you need to match both the server and instance id you would need to have a field with a composite of those values and use this field in your enrich policy.

Another option would be to create an extra ingest pipeline and call this pipeline to be executed using the pipeline processor in your main ingest pipeline.

In this extra ingest pipeline you would have as many set processors as you need to conditionally add the app.version field, something like this.

PUT _ingest/pipeline/set-app-versions
{
  "processors": [
    {
      "set": {
        "description": "set app version",
        "if": "ctx.app?.name == 'appName' and ctx.server?.name == 'serverName",
        "field": "app.version",
        "value": "X.Y.Z"
      }
    }
  ]
}

You could then automate the changes on this pipeline in your deploy to change the values and conditionals.

stefws · March 19, 2023, 4:42pm

Thanks, will dig futher into this path...

Sunile_Manjee · March 20, 2023, 3:52am

Ingest pipelines are the way to go. out of curiosity, would it be possible to use the script processor like this to fetch a value from another file

filebeat.inputs:
- type: log
  paths:
    - /path/.../file.log

processors:
  - script:
      lang: javascript
      inline: |
        const fs = require('fs');
        const content = fs.readFileSync('/path/to/lookup/file.txt', 'utf8');
        return { 'app.version': content.trim() };

stefws · March 20, 2023, 9:06pm

This might work, only very inefficient to open a file for every doc to index

Sunile_Manjee · March 21, 2023, 3:10am

yep that's correct. Not optimal at scale.

system · April 18, 2023, 5:10am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.