Hello,
I'm trying to extract fields from messages and making some reuse of information tokens but sometimes my regexps are too greedy and capture outside the end tag.
I have a few different type of messages but some tokens are reappearing (or not) between type of messages and in different order.
Tokens are formatted as : PropertyName={property value}
Here an example of 2 messages:
INFO com.l7tech.server.policy.assertion.ServerAuditDetailAssertion: -4: Message={Ouverture session}, MessageType={TypeMessageAuthentification}, Username={etud1}, SessionAuthorization={5af2143d-7368-46de-955e-2a014bd7d39f}, SessionClient={ZxFTldw1Ed242t1L7dML8cIgkM}, RequestId={00000154e84e6393-11147}, ServiceName={auth/oauth/v2/authorize}, ServiceId={7aea4881665af7743edf0dcb0d8ddfef}, ServiceGuid={304e9de3-ba27-4260-b448-e3476530a0c2}, ServiceVersion={54}, ClusterNodeName={Gateway1}
INFO com.l7tech.server.policy.assertion.ServerAuditDetailAssertion: -4: Message={Fermeture session}, MessageType={TypeMessageAuthentification}, SessionClient={p051Q3ZdhoCaNo9ASMp11uEhXHU}, RequestId={00000154e84e6393-1100b}, ServiceName={UL Page Logout}, ServiceId={7aea4881665af7743edf0dcb0d924d36}, ServiceGuid={b931e09d-9eaf-4e43-a14a-120815141b5d}, ServiceVersion={59}, ClusterNodeName={Gateway1}
To make it easily readable and fully reusable I have made a pattern for each recurring tokens:
##################
# General pattern, identical beginning part of each messages
UL_GSA_BASE %{LOGLEVEL}%{SPACE}%{JAVACLASS}: %{INT}:
UL_GSA_COMMON %{LOGLEVEL}%{SPACE}%{JAVACLASS}: %{INT}: %{UL_GSA_MESSAGE}, %{UL_GSA_MESSAGE_TYPE}
UL_GSA_COMMON_REMAINING %{UL_GSA_COMMON}, %{GREEDYDATA:remaining}
##################
# Reusable patterns
UL_GSA_MESSAGE Message={(?<message>.*?)}
UL_GSA_MESSAGE_TYPE MessageType={(?<message_type>.*?)}
UL_GSA_USERNAME Username={(?<username>.*?)}
UL_GSA_SESSION_CLIENT SessionClient={(?<session_client>.*?)}
UL_GSA_SESSION_AUTHORIZATION SessionAuthorization={(?<session_authorization>.*?)}
UL_GSA_SSO SSO={(?<sso>.*?)}
UL_GSA_REQUEST_ID RequestId={(?<request_id>.*?)}
UL_GSA_API_KEY APIKey={(?<api_key>.*?)}
UL_GSA_IP_ADDRESSES IpAddresses={(?<api_adresses>.*?)}
UL_GSA_RESPONSE_CODE ResponseCode={(?<response_code>.*?)}
UL_GSA_RESPONSE_LATENCY ResponseLatency={(?<response_latency>.*?)}
UL_GSA_RESPONSE_ERROR ResponseError={(?<response_error>.*?)}
UL_GSA_SERVICE_ID ServiceId={(?<service_id>.*?)}
UL_GSA_SERVICE_NAME ServiceName={(?<service_name>.*?)}
UL_GSA_SERVICE_GUID ServiceGuid={(?<service_guid>.*?)}
UL_GSA_SERVICE_VERSION ServiceVersion={(?<service_version>.*?)}
UL_GSA_GATEWAY_HOST GatewayHost={(?<gateway_host>.*?)}
UL_GSA_GATEWAY_SERVICE_URL GatewayServiceUrl={(?<gateway_service_url>.*?)}
UL_GSA_CLUSTER_NODE_NAME ClusterNodeName={(?<cluster_node_name>.*?)}
Then I construct final patterns by reusing the needed parts in the good order:
##################
# Specific patterns
UL_GSA_OUVERTURE_SESSION %{UL_GSA_COMMON}, %{UL_GSA_USERNAME}, %{UL_GSA_SESSION_AUTHORIZATION}, %{UL_GSA_SESSION_CLIENT}, %{UL_GSA_REQUEST_ID}, %{UL_GSA_SERVICE_NAME}, %{UL_GSA_SERVICE_ID}, %{UL_GSA_SERVICE_GUID}, %{UL_GSA_SERVICE_VERSION}, %{UL_GSA_CLUSTER_NODE_NAME}
UL_GSA_FERMETURE_SESSION %{UL_GSA_COMMON}, %{UL_GSA_SESSION_CLIENT}, %{UL_GSA_REQUEST_ID}, %{UL_GSA_SERVICE_NAME}, %{UL_GSA_SERVICE_ID}, %{UL_GSA_SERVICE_GUID}, %{UL_GSA_SERVICE_VERSION}, %{UL_GSA_CLUSTER_NODE_NAME}
Using http://grokdebug.herokuapp.com/ site, testing each message type individually with corresponding pattern works fine.
But when using
INFO com.l7tech.server.policy.assertion.ServerAuditDetailAssertion: -4: Message={Ouverture session}, MessageType={TypeMessageAuthentification}, Username={etud1}, SessionAuthorization={5af2143d-7368-46de-955e-2a014bd7d39f}, SessionClient={ZxFTldw1Ed242t1L7dML8cIgkM}, RequestId={00000154e84e6393-11147}, ServiceName={auth/oauth/v2/authorize}, ServiceId={7aea4881665af7743edf0dcb0d8ddfef}, ServiceGuid={304e9de3-ba27-4260-b448-e3476530a0c2}, ServiceVersion={54}, ClusterNodeName={Gateway1}
with the non-corresponding pattern %{UL_GSA_FERMETURE_SESSION} my UL_GSA_MESSAGE_TYPE pattern seems to be too greedy and captures to much, while I would like matching to fail.
Captured message_type field then looks like this:
{
"message": [
[
"Ouverture session"
]
],
"message_type": [
[
"TypeMessageAuthentification}, Username={etud1}, SessionAuthorization={5af2143d-7368-46de-955e-2a014bd7d39f"
]
],
...
So, I tried a lot of thing...
How to make my capture strictly between one pair of curly braces?
Thanks a lot