Error calling pipeline when setting the document ID in beats

Francisco_Peralta_Gu · May 10, 2021, 9:26am

Hi.
I'm facing issues when I try to set the document id in filebeat to avoid duplicates:
I have a filebeat configuration as following:

filebeat.inputs:  
    - type: log
      paths:
        - /dumps-json/wsdumps/**
      multiline.pattern: '(?s)|^}'
      multiline.negate: false 
      multiline.match: after
      processors:
      - add_id: ~
      - add_fields:
          fields:
            index: dumps
....

 output.elasticsearch:
      protocol: https
      hosts: ['${ELASTICSEARCH_HOST:elasticsearch}:${ELASTICSEARCH_PORT:9200}']
      username: ${ELASTICSEARCH_USERNAME}
      password: ${ELASTICSEARCH_PASSWORD}
      pipelines:
        - pipeline: "%{[fields.index]}"
          mappings:
            dumps: "xmltojson"

With the configuration above, the Elastic pipeline xmltojson is not been launched. Notice the - add_id: ~ processor.

But if I delete that processor, the ingest pipeline is working propertly.

Marius_Iversen · May 10, 2021, 11:52am

The ID's field should be autogenerated when a new document is created in Elasticsearch, there shouldn't be any need to set the add_id processor anymore.

Do you see any duplications on ES?

Francisco_Peralta_Gu · May 10, 2021, 12:49pm

Yes I can see duplicated documents as explained in this section:

Marius_Iversen · May 10, 2021, 12:54pm

Ah okay gotcha! Which errors are you getting when you try to start it up? Or does it just not generate an ID?

You can stop the service and run it with some extra debugging logging and in the foreground if you want, with filebeat -e -d "*".

The best places to look for errors is at the start when its loading the config, and when it is trying to send files to ES.

If you don't want to run it in the foreground, you can always then just grep the logs for ERR and WARN.

Francisco_Peralta_Gu · May 10, 2021, 12:59pm

I'm not getting any errors. The documents are been populated into Elasticsearch but once I try to stablish the document_id in Filebeat , the ingest pipeline "xmltojson" is no longer executed. When delete the add_id processor, the pipeline is launched and the transformation included in it is properly applied.

Marius_Iversen · May 10, 2021, 1:23pm

I am a bit unsure if I understand what you mean, your pipeline is not set to "xmltojson", it is set to another type of value:

  pipelines:
    - pipeline: "%{[fields.index]}"
      mappings:
        dumps: "xmltojson"

This configuration won't execute the a pipeline named xmltojson.

We also have a decode_xml processor, if the purpose is to only convert XML to JSON, might that be of interest?
Ref: Decode XML | Filebeat Reference [7.12] | Elastic

Francisco_Peralta_Gu · May 10, 2021, 1:30pm

Really the pipeline is not called.

I have tried that configuration and it worked depending on the fields.index value stablished by the add_fields processor above :

 - add_fields:
          fields:
            index: dumps

That configuration works without the add_id processor.

I cannot use decode_xml due to a bug that will be resolved in 7.13 version

system · June 7, 2021, 3:30pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Setting the Document ID Beats filebeat	7	1710	November 30, 2018
How can i process the duplication id or custome document_id on filebeat? Beats filebeat	2	754	August 2, 2017
Duplication in Filebeat to Elasticsearch data pushing Beats filebeat	5	736	December 28, 2017
Deduplicate data Beats filebeat	4	544	July 15, 2022
Filebeat deduplication fail to update index Beats filebeat	8	1098	June 18, 2020

Error calling pipeline when setting the document ID in beats

Related topics