Your pattern only works if there are two IP addresses present, but the input line that fails only has one. To solve this you can e.g. specify two patterns and grok will try them in order and use the first that matches.
This following works and captures the virtualhost and both client IP addresses:
Thank you for responding. I've been trying to get a handle on the grok parse format. With this in mind instead of using the built in grok pattern %{COMBINEDAPACHELOG} I'm trying to create my own, just to verify my understanding. When I created my own patterns, I still get grok parse errors in the config test I'm running. Could you look at my patterns and tell me where I'm going wrong?
Pay attention to the whitespace. Your example log entries have no space after the comma that separates the two IP addresses, but there are two spaces after the hostname (i.e. before the IP addresses).
I feel like I'm close to understanding the syntax. But I don't get why you have the + sign in front of the %{combinedapachelog} and in front of the +%{IP:clientip},%{COMBINEDAPACHELOG} . Do you need the plus sign to handle when there are spaces in the expression?
In regular expressions plus signs mean "one or more occurrences of the preceding token". In this case the preceding token is a space, so it's a way to be more lax about the number of spaces and handle both one and two (and ten) occurrences of them.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.