For a more complete solution around identifying bots, try using the 'useragent' filter. One of the fields that will add to your events is 'device', which typically shows various phone models, 'Other' for typical web-browsers, and 'Spider'.
I've used it in production for many years; its one of the most useful plugins for triaging web-server issues.
Thanks. this is the perfect solution right now.
I had using useragent filter already, but I did not notice the client.agent.device, it's already match the spider automatic for me.
Thanks again.
What does an entire record look like? Can you share the entire filter{} so we can see these pieces in context?
PS. I would suggest its most useful to keep the spider activity and use a filter in Kibana etc. Gives much greater visibility with regard to what effect spiders are having (particularly for correlating outages and performance analysis. It would also give you information for making informed decisions around rate-limiting based on the likes of user-agents.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.