Hi all, my colleague and I have been working on using the memcached filter plugin. We've got memcached fed with a inventory data about our various network equipment.
The goal: I want to label all syslog traffic from our networking devices with information about what that device is, such as:
- is it a distribution switch; access switch; wireless controller; firewall, etc.
- which building and network cabinet is it on, etc.
 ... and more besides.
Pretty useful if you want to aggregate a huge volume of network logs.
To this end I've introduced the following configuration:
filter {
  if ...is-syslog... and and [syslog-host-ip] == "... single IP ..." {
    memcached {
      id => "networking.memcached.32"
      hosts => ["127.0.0.1"]
      namespace => "network_devices"
      get => {
        "%{syslog-host-ip}" => "[@metadata][network_devices_temp]"
      }
    }
    json {
      id => "networking.json.41"
      source => "[@metadata][network_devices_temp]"
      target => "source_device"
    }
  }
}
This works (now), but I had to constrain it to just a single IP as I noticed it was performing worse than that my elasticsearch output (which is the bottleneck generally).
# curl -s "http://127.0.0.1:9601/_node/stats/pipelines/main" | jq '.pipelines.main.plugins.filters[] | select(.id == "networking.memcached.32")'
{
  "id": "networking.memcached.32",
  "name": "memcached",
  "events": {
    "in": 15143,
    "duration_in_millis": 2230,
    "out": 15143
  }
}
(the 32 in the ID is based on the line-number; just as a way of making the IDs unique without thinking too hard about how to name things)
Now that I know it is functionally working, I want to start introducing it to more traffic so I can observe impact on the pipeline.
I would like to able be process say ever Nth record, or process some percentage of the traffic, or possibly process say every 10 sequential records for every 1000 (or have a duty cycle of 5 seconds per minute).
I suppose I could use a ruby plugin to do some percentage, but I'm wondering if there is something more commonly used, or perhaps something like awk's NR variable? I know the 'drop' module has a percentage attribute; but I'm not interested in dropping them 