Error calling pipeline when setting the document ID in beats

Hi.
I'm facing issues when I try to set the document id in filebeat to avoid duplicates:
I have a filebeat configuration as following:

filebeat.inputs:  
    - type: log
      paths:
        - /dumps-json/wsdumps/**
      multiline.pattern: '(?s)|^}'
      multiline.negate: false 
      multiline.match: after
      processors:
      - add_id: ~
      - add_fields:
          fields:
            index: dumps
....

 output.elasticsearch:
      protocol: https
      hosts: ['${ELASTICSEARCH_HOST:elasticsearch}:${ELASTICSEARCH_PORT:9200}']
      username: ${ELASTICSEARCH_USERNAME}
      password: ${ELASTICSEARCH_PASSWORD}
      pipelines:
        - pipeline: "%{[fields.index]}"
          mappings:
            dumps: "xmltojson"

With the configuration above, the Elastic pipeline xmltojson is not been launched. Notice the - add_id: ~ processor.

But if I delete that processor, the ingest pipeline is working propertly.

The ID's field should be autogenerated when a new document is created in Elasticsearch, there shouldn't be any need to set the add_id processor anymore.

Do you see any duplications on ES?

Yes I can see duplicated documents as explained in this section:

Ah okay gotcha! Which errors are you getting when you try to start it up? Or does it just not generate an ID?

You can stop the service and run it with some extra debugging logging and in the foreground if you want, with filebeat -e -d "*".

The best places to look for errors is at the start when its loading the config, and when it is trying to send files to ES.

If you don't want to run it in the foreground, you can always then just grep the logs for ERR and WARN.

I'm not getting any errors. The documents are been populated into Elasticsearch but once I try to stablish the document_id in Filebeat , the ingest pipeline "xmltojson" is no longer executed. When delete the add_id processor, the pipeline is launched and the transformation included in it is properly applied.

I am a bit unsure if I understand what you mean, your pipeline is not set to "xmltojson", it is set to another type of value:

  pipelines:
    - pipeline: "%{[fields.index]}"
      mappings:
        dumps: "xmltojson"

This configuration won't execute the a pipeline named xmltojson.

We also have a decode_xml processor, if the purpose is to only convert XML to JSON, might that be of interest?
Ref: Decode XML | Filebeat Reference [7.12] | Elastic

Really the pipeline is not called.

I have tried that configuration and it worked depending on the fields.index value stablished by the add_fields processor above :

 - add_fields:
          fields:
            index: dumps

That configuration works without the add_id processor.

I cannot use decode_xml due to a bug that will be resolved in 7.13 version

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.