Multiple ingest pipelines - docs going to the wrong pipeline

Hello,

I've got three ES master/data nodes, and one ingest node running kibana. All servers in the environment are running filebeat for log shipping. I'm seeing a lot of pipeline errors in the elasticsearch logs about documents that shouldn't have been tagged with the pipeline listed in the errors, so then the pattern matching fails.

I was seeing similar issues last week running 5.2.2 for ES/Kibana/Filebeat. Troubleshooting today and I ended up upgrading all components to 5.3.

It appears that all the documents are being tagged with a particular pipeline, ignoring the conditionals.

Am I configuring filebeat incorrectly? I'm tagging each prospector, sending docs with those tags to an index, and same for the pipeline.

My filebeat.yml:
#=========================== Filebeat prospectors =============================
filebeat.prospectors:
- input_type: log
paths: ["/var/log/syslog"]
tags: ["syslog"]
exclude_lines: ["salt-minion","salt-master"]

- input_type: log
  paths: ["/var/log/kibana/kibana.log"]
  tags: ["kibana"]
  json.message_key: message
  json.keys_under_root: false
  multiline.pattern: '^\s'
  multiline.match: after
  document_type: kibana-logs

- input_type: log
  tags: ["salt"]
  paths:
    - /var/log/salt/master
    - /var/log/salt/api
    - /var/log/salt/minion

- input_type: log
  tags: ["containers"]
  paths: ["/var/log/containers/*.log"]
  symlinks: true
  json.message_key: log
  json.keys_under_root: true
  multiline.pattern: '^\s'
  multiline.match: after
  document_type: kube-logs

- input_type: log
  tags: ["kube"]
  paths:
    - "/var/log/kube*.log"
    - "/var/log/etcd.log"
- input_type: log
  tags: ["haproxy"]
  paths: ["/var/log/haproxy.log"]

#================================ General =====================================
processors:
- drop_fields:
    fields: ["offset", "beat.name", "beat.version"]
#================================ Outputs =====================================

#-------------------------- Elasticsearch output ------------------------------
output.elasticsearch:
  # Array of hosts to connect to.
  hosts: ["logesnode01:9200", "logesnode02:9200", "logesnode03:9200"]
  index: "logs"
  indices:
    - index: "syslog"
      when.contains:
        tags: "syslog"
    - index: "kibana"
      when.contains:
        tags: "kibana"
    - index: "salt"
      when.contains:
        tags: "salt"
    - index: "kube"
      when.contains:
        tags: "kube"
    - index: "containers"
      when.contains:
        tags: "containers"
    - index: "haproxy"
      when.contains:
        tags: "haproxy"
  pipelines:
    - pipeline: "kube-pipeline"
      when.contains:
        tags: "containers"
    - pipeline: "haproxy-pipeline"
      when.contains:
        tags: "haproxy"
    - pipeline: "salt-pipeline"
      when.contains:
        tags: "salt"

Errors:
[2017-03-28T17:30:49,194][DEBUG][o.e.a.b.TransportBulkAction] [logesclient01] failed to execute pipeline [salt-pipeline] for document [salt/log/null]
org.elasticsearch.ElasticsearchException: java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: Provided Grok expressions do not match field value: [ linux-image-4.4.0-64-generic]
at org.elasticsearch.ingest.CompoundProcessor.newCompoundProcessorException(CompoundProcessor.java:156) ~[elasticsearch-5.3.0.jar:5.3.0]
[2017-03-28T17:30:49,195][DEBUG][o.e.a.b.TransportBulkAction] [logesclient01] failed to execute pipeline [salt-pipeline] for document [s alt/log/null]
org.elasticsearch.ElasticsearchException: java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: Provided Grok expressi
ons do not match field value: ['filebeat' changed from 'absent' to '5.3.0']

salt-pipeline:
curl localhost:9200/_ingest/pipeline/salt-pipeline?pretty
{  "salt-pipeline" : {
    "description" : "Salt pipleine",
    "processors" : [ {
        "grok" : {
          "field" : "message",
          "patterns" : [
            "%{TIMESTAMP_ISO8601:timestamp} \\[%{GREEDYDATA:function}\\]\\[%{GREEDYDATA:level}\\]\\[%{GREEDYDATA:process}\\] %{GREEDYDATA:event}"
          ] } },
      { "remove" : { "field" : "message" } } ] } }

All the pipelines listed in the filebeat.yml exist.

please use the </> button to format config files and log output!

you're config looks somewhat overcomplicated to me. I'd use fields for setting pipeline and index name (btw. filebeat 5.3 adds 'pipeline'-setting in the prospector configuration)

e.g.

filebeat.prospectors:
- input_type: log
  paths: ["/var/log/syslog"]
  fields.kind: "syslog"  # <- used to configure the index (can be used for filtering in ES as well)
  exclude_lines: ["salt-minion","salt-master"]
  pipeline: "salt-pipeline"  # <- configure ES pipeline in prospector
- ...

output.elasticsearch:
  hosts: ["logesnode01:9200", "logesnode02:9200", "logesnode03:9200"]
  index: "%{fields.kind:logs}" # <- use 'fields.kind' as index name, with default value 'logs'

btw. one can change the index name in ES ingest node as well.

The error seems to be when evaluating the grok expression. That is, the event does not match you're grok pattern. Using ingest node error handling (e.g.) on_failure at top-level you can mark/process failed events (e.g. send to another dead-letter-queue-index) for later introspection. e.g. see mysql filebeat module

1 Like

Thanks for the suggestions! That does look much simpler, I like it.

I have implemented the suggestions, but now the filebeat service will not start.

Error:
2017-03-29T16:21:57Z INFO Setup Beat: filebeat; Version: 5.3.0
2017-03-29T16:21:57Z ERR failed to initialize elasticsearch plugin as output: unsupported format expression "fields.type" in output.elasticsearch.index
2017-03-29T16:21:57Z CRIT Exiting: error initializing publisher: unsupported format expression "fields.type" in output.elasticsearch.index

Config:
#=========================== Filebeat prospectors =============================
filebeat.prospectors:
- input_type: log
paths: ["/var/log/syslog"]
fields.type: "syslog"
exclude_lines: ["salt-minion","salt-master"]
pipeline: "salt-pipeline"

- input_type: log
  fields.type: "salt"
  paths:
    - /var/log/salt/master
    - /var/log/salt/api
    - /var/log/salt/minion
  pipeline: "salt-pipeline"
- . . .


#-------------------------- Elasticsearch output ------------------------------
output.elasticsearch:
  # Array of hosts to connect to.
  hosts: ["logesnode01:9200", "logesnode02:9200", "logesnode03:9200"]
  index: "%{fields.type:logs}"
#================================ Logging =====================================

I also tried the following, thinking it might have been complaining about the default:
index: "%{fields.type}"

Any suggestions? Anything else I can add?

ah, my fault. Documentation says Format Strings requires [] in order to access event fields. That is, this should work: %{[fields.type]:logs}

After looking in formatevents.go, it looks like it needed brackets around the fields.type:
index: "%{[fields.kind]:logs}"
After applying this change, filebeat is starting!

Thanks for the direction.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.