I'd like to separate all the components in the following field... "uri_domain, uri_path, uri_root", I would also like to extract the query string into sub fields like "query_params: application, inf.name".
For example, how do I expand URIPATH and URIPARAM vs URIPATHPARAM? Why isn't all of them expanded when I call just "URI" through these GROK definitions?
paths
PATH (?:%{UNIXPATH}|%{WINPATH})
UNIXPATH (/([\w_%!$@:.,~-]+|\\.)*)+
TTY (?:/dev/(pts|tty([pq])?)(\w+)?/?(?:[0-9]+))
WINPATH (?>[A-Za-z]+:|\\)(?:\\[^\\?*]*)+
URIPROTO [A-Za-z]+(\+[A-Za-z+]+)?
URIHOST %{IPORHOST}(?::%{POSINT:port})?
# uripath comes loosely from RFC1738, but mostly from what Firefox
# doesn't turn into %XX
URIPATH (?:/[A-Za-z0-9$.+!*'(){},~:;=@#%_\-]*)+
#URIPARAM \?(?:[A-Za-z0-9]+(?:=(?:[^&]*))?(?:&(?:[A-Za-z0-9]+(?:=(?:[^&]*))?)?)*)?
URIPARAM \?[A-Za-z0-9$.+!*'|(){},~@#%&/=:;_?\-\[\]<>]*
URIPATHPARAM %{URIPATH}(?:%{URIPARAM})?
URI %{URIPROTO}://(?:%{USER}(?::[^@]*)?@)?(?:%{URIHOST})?(?:%{URIPATHPARAM})?
If you run this GROK statement on your uri_param field you can get these results. This will break it up into two separate components. No idea if it is the best method, but it has worked ok for us.
So now we can take the uri_query we generated above (with added parameters). {"uri_query":"application=&inf.name=eth0&test1=blah&test2=blahblahblah"}
And when we run it through that KV filter we get:
You can then drop the original uri_query field or keep it around. If you want to limit which fields get created use the include_keys so that you don't accidentally create hundreds of new fields in your cluster.
I found the "target" parameter to put the query string into a container. Exactly what I needed. I couldn't put all my "match" statement in the same grok, had to be separated...
I think for whatever reason the URI Grok statement just wasn't designed how you or I want it to work.
A comment on the second GROK statement.
As long as there are query parameters (Anything after the question mark) this will work fine.
However if you have a URI that does not have this this GROK statement will not match. That means that uri_query and uri_path will not be populated. uri_query is a given since there is no query, but depending on your usage you may still want uri_path to show up. At this point it would hold the same value as uri_param, but for aggregations of searching that may not matter.
If you still want uri_path to be populated regardless you can try using this version. If the first match doesn't work, it will go onto the second statement. You could also try using conditionals to check if the field doesn't exist and add it, but this method seems pretty simple.
Hey thanks for replying. I was able to figure that out later.
Here is the link of my post in which I had an issue. If anyone still looking similar problem and the solution.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.