How to use if condition to filter spider event gracefully in logstash

I can use drop event in filebeat to filter spider logs , just like following:

  - drop_event:
      when:
        or:
          - contains:
              message: FacebookBot
          - contains:
              message: Googlebot
            ...

How to do it in logstash ? There are so many spider event , how can i use if or condition just like following:

if ["TwitterBot","FacebookBot","Googlebot","AppleBot","xxx","xxxx"] in [message] {
    drop {}
}

It's desn't work..

See here.

1 Like

Thanks , it's working !

For a more complete solution around identifying bots, try using the 'useragent' filter. One of the fields that will add to your events is 'device', which typically shows various phone models, 'Other' for typical web-browsers, and 'Spider'.

I've used it in production for many years; its one of the most useful plugins for triaging web-server issues.

1 Like

Thanks. this is the perfect solution right now.
I had using useragent filter already, but I did not notice the client.agent.device, it's already match the spider automatic for me.
Thanks again.

But i'm trying like this:

if [client.agent] != "-" {
    useragent {
        source => "useragent"
        target => "client.agent"
    }
}

Now i can see "iOS" "Other" "Spider" in "client.agent.device" fields.
When i add a filter like following , it doesn't work

if [client][agent][device] =~ "Spider" {
    drop {}
}

What does an entire record look like? Can you share the entire filter{} so we can see these pieces in context?

PS. I would suggest its most useful to keep the spider activity and use a filter in Kibana etc. Gives much greater visibility with regard to what effect spiders are having (particularly for correlating outages and performance analysis. It would also give you information for making informed decisions around rate-limiting based on the likes of user-agents.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.