Configuring the same source container twice

Mhag · May 12, 2023, 6:07am

Hi,

I need to filter events coming from the same source in filebeat level and tag them (filtred not filtred for ex) before sending them to logstash.
And I wonder if there is some options to duplicate events (like clone filter in logstash) without duplicating configurations like this ? :

    filebeat.inputs:
    - type: container
      enabled: true
      multiline.type: pattern
      multiline.pattern: '^\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3}'
      multiline.negate: true
      multiline.match: after
      symlinks: true
      paths:
        - /var/log/containers/*kr-namespace*.log
      tags: ["filtered"]
      include_lines: ['RESPONSE','REQUEST']
      processors:
        - add_kubernetes_metadata:
            host: ${NODE_NAME}
            in_cluster: true
            default_matchers.enabled: false
            matchers:
            - logs_path:
                logs_path: /var/log/containers/
    - type: container
      enabled: true
      multiline.type: pattern
      multiline.pattern: '^\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3}'
      multiline.negate: true
      multiline.match: after
      symlinks: true
      paths:
        - /var/log/containers/*kr-namespace*.log
      tags: ["not_filtered"]
      exclude_lines: ['DEBUG']
      processors:
        - add_kubernetes_metadata:
            host: ${NODE_NAME}
            in_cluster: true
            default_matchers.enabled: false
            matchers:
            - logs_path:
                logs_path: /var/log/containers/
    output.logstash:
      hosts: ["host:port"]

Thanks in adavance for your help.

Anton_H · May 12, 2023, 8:24am

I am really curious why you need to do this before logstash and cannot do this in logstash...
This would only make sense when you need to split these event and send them to two different logstash end points...
Can you elaborate on this?

Mhag · May 12, 2023, 10:04am

Hi,

The source is a k8s cluster in cloud the elk stack is on premise so need to filter events and duplicate sources for two reasons :

the cost of transferring logs from cloud to onpremis site so I don't need to transfer all logs down.
I need two types of event one withe response and request that some processing is configured on logstash. The other type without debug events and this later is not processed but stored as is in Elasticsearch.

Hope my explaining is clear.

Anton_H · May 12, 2023, 11:53am

If I understand correctly, your current example configuration sends events with 'RESPONSE' and 'REQUEST' with the tag 'filtered' and all lines Excluding 'DEBUG' with the tag 'not_filtered.'
This means that actually all lines Excluding 'DEBUG' are being send anyway.
Why not use only the second part of your config and add the 'filtered' or 'not_filtered' tag afterwards in Logstash? This way you even limit the number of data going from your 8ks cluster to your onprem environment because you only send events once... right?

Mhag · May 12, 2023, 1:43pm

Yes, you are right.

but in the logstash we need to send events to two different indices based on tags so the logstash side the configuration is like this :

input {
	# beats input configuration goes here
}

filter { 

	if "filtred" in [tags] {	

            ## do some parsing , processing and so on

	}


}


output {
 
  if "filtred" in [tags] {
    elasticsearch {
      hosts => ["es-host"]
      index => "filtred-index"
    }
  }
  
  if "not_filtred" in [tags] {
    elasticsearch {
      hosts => ["es-host"]
      index => "not-filtred-index"
    }
  }
}

Thanks for your help.

Anton_H · May 13, 2023, 11:09am

@Mhag It's difficult without actual example or sanitized data but you could use a processor in filebeat with a regex condition to add the "filtered" tag to lines with "RESPONSE" or "REQUEST".
You could also do this in Logstash.
Then you would use a negative filter to send data to your unfiltered index with:

output {
 
  if "filtred" in [tags] {
    elasticsearch {
      hosts => ["es-host"]
      index => "filtred-index"
    }
  }
  if "filtred" not in [tags] {
    elasticsearch {
      hosts => ["es-host"]
      index => "not-filtred-index"
    }
  }
}

OR

output {
 
  if "filtred" in [tags] {
    elasticsearch {
      hosts => ["es-host"]
      index => "filtred-index"
    }
  } else {
    elasticsearch {
      hosts => ["es-host"]
      index => "not-filtred-index"
    }
  }
}

Because now you have both "filered" and "not_filtered" events in your "not-filtered-index", I think, unless that is on purpose.

Note though that it is better to have an explicit assignment to your not_filtered events, if you have multiple inputs outside of this scope then it events from those inputs might end up in your "not-filtred-index" becuase they don´t contain the "filtred" tag.

Side note: Not sure if you mean filtre (French?) or filter. Just make sure it is consistent, typos are my nemesis

Mhag · May 13, 2023, 3:58pm

Thanks @Anton_H.

Because now you have both "filered" and "not_filtered" events in your "not-filtered-index", I think, unless that is on purpose.

The goal is to have all events in the not_filtered index, except those with the word "DEBUG," while the filtered index will only have events with the words "RESPONSE" and "REQUEST."

I think I will go with your first proposal so in filebeat config without tagging events :

    filebeat.inputs:
    - type: container
      enabled: true
      multiline.type: pattern
      multiline.pattern: '^\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3}'
      multiline.negate: true
      multiline.match: after
      symlinks: true
      paths:
        - /var/log/containers/*kr-namespace*.log
      exclude_lines: ['DEBUG']
      processors:
        - add_kubernetes_metadata:
            host: ${NODE_NAME}
            in_cluster: true
            default_matchers.enabled: false
            matchers:
            - logs_path:
                logs_path: /var/log/containers/
    output.logstash:
      hosts: ["host:port"]

At logstash level I will do this with cloning events and filtering for tags and words REQUEST and RESONSE :

input {
	# Your input configuration goes here
}

filter { 
   
   	clone {
   		clones => ["not_filtered", "filtered"]
 	}


	if "filtered" in [tags] {	
	
	    if "REQUEST" in [message] or "RESPONSE" in [message] {

            ## do some parsing , processing and so on
		}
        else {
           drop {}
        }
	}


}


output {
 
  if "filtered" in [tags] {
    elasticsearch {
      hosts => ["es-host"]
      index => "filtered-index"
    }
  }
  
  if "not_filtered" in [tags] {
    elasticsearch {
      hosts => ["es-host"]
      index => "not-filtered-index"
    }
  }
}

This way I don't need to duplicate config in filebeat level and I will transfer events once from cloud to on premise.

Yah, and thanks for spoting my typos, this my french invading my english

Regards

system · June 10, 2023, 5:58pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Duplicate events in filebeat + logstash + elasticsearch pipeline Logstash	2	1913	July 6, 2017
Filebeat 5.2.2 'pipelining' option causes (?) event duplication when sending to logstash Beats filebeat	4	1038	May 4, 2017
Multiple filebeat instances in single docker container? Beats docker , filebeat	4	1642	May 15, 2019
Filebeat Suddenly duplicating events more than 4 entries per log Beats filebeat	1	333	August 1, 2020
Is there something about my configuration isn't right? What did I miss? Beats filebeat	2	399	January 3, 2019

Configuring the same source container twice

Related topics