Using custom pipeline with filebeat module

I've read a hundred pages of docs/forum posts/random blogs and am just as confused as I was to begin with - I took a break from this project for a few months, but nothing magically changed :wink:

Overall Scenario:

  • The filebeat osquery module has almost all values saved as strings.
  • I want the numeric types to actually be saved as numbers to enable better math and graph capabilities. (See post from July: OSQuery field types)
  • Ingest processors are the solution for converting fields/values
  • Due to how the osquery modules works, the fields that need to be converted can't be accessed by Processors defined within the filebeat module yml, they must be an elasticsearch pipeline (See Using a processor in a filebeat module may or may not actually find fields)

Problem:

  • I can not get a pipeline processor to actually apply to a message, it seems like it is just being totally ignored

Steps performed:

  1. Ensured filebeat/elasticsearch/osquery were all talking to each other fine, data shows up as expected, default pipelines installed via setup --pipelines --modules osquery etc
  2. Verified I could suggessfully add a field directly from the filebeat osquery module yml [my-osquery.yml]. This worked.
  3. Removed that block from the my-osquery.yml
  4. Created a custom pipeline that would add a field
  5. Updated the main filebeat config [my-filebeat.config.yml] to point to my pipeline and to use a new index pattern
  6. Stopped and restarted filebeat with that new config
  7. Checked new data, did not see the additional field
  8. Used Kibana console to run a POST _ingest/pipeline/test_osquery_fixer/_simulate with the _source from a real osquery message. Verified that the custom pipeline does work correctly in that mode.

Random things tried:

  • updated my pipeline to have a step where it uses the pipeline processor to run the original osquery pipeline too, just in case that was needed, but that didn't work
  • added a pipeline config line to the osquery module yml randomly, didn't expect that to work

Here is my pipeline:GET _ingest/pipeline/test_osquery_fixer

{
  "test_osquery_fixer" : {
    "description" : "testing pipeline",
    "processors" : [
      {
        "pipeline" : {
          "name" : "filebeat-7.4.2-osquery-result-pipeline"
        }
      },
      {
        "set" : {
          "field" : "custompipe",
          "value" : "testosqueryfixer",
          "ignore_failure" : false
        }
      }
    ]
  }
}

I added the pipeline line to my-filebeat-config.yml, didn't even try to do conditional-only-for-osquery-docs yet:

output.elasticsearch:
  hosts: ["asdf.local:9200"]
  pipeline: "test_osquery_fixer"

Even have the pipeline name in the modules.d/my-osquery.yml config...

# Module: osquery
- module: osquery
  result:
    enabled: true
    var.use_namespace: true
    input:
      pipeline: "test_osquery_fixer"

Final results are:

  • Under _simulate the pipeline works
  • For messages being received by filebeat the pipeline doesn't work

Oh and with logging set to debug, I do see that the pipeline name that I specify isn't being used:

2019-11-29T22:53:06.582-0800	DEBUG	[processors]	processing/processors.go:183	Publish event: {
  "@timestamp": "2019-11-30T06:53:06.582Z",
  "@metadata": {
    "beat": "filebeat",
    "type": "_doc",
    "version": "7.4.2",
    "pipeline": "filebeat-7.4.2-osquery-result-pipeline"
  },
  "ecs": {
    "version": "1.1.0"
  },
  "host": {

And after letting it run for a few minutes, the main logfile (/logs/filebeat) doesn't have the name of my custom pipeline anywhere in it, or any ERROR lines.

In the mean time, I've just manually added a

  {
    "pipeline" : {
      "name" : "osquery-custom-branch-pipe"
    }
  }

at the end of the processor list in the auto-loaded/default osquery module pipeline. I'll have to remember to do that any time I update filebeat of course.

I think that adding a result.input.pipeline entry to the module's yml should be set to do the same thing that I did manually - run at the end of the default osquery pipeline automatically.

What are the fields in which module are you trying to convert to numbers?
Also for pipeline, I believe you can delete the old pipeline and then restart filebeat in order to load the new pipeline. For example:

GET _ingest/pipeline
DELETE _ingest/pipeline/filebeat-8.0.0-aws-s3_server_access_log-pipeline