Filtering json with embedded xml

I'm pulling json messages from a jms queue, where most of them filter properly. However there are specific messages with XML payload. What would be the best way to filter this type of message?

{
       "envCode": "RWCS",
       "appCode": "TIBCO",
       "type": "INFO",
       "hostName": "",
       "interfaceCode": "TIB_EXP.EMAIL",
       "interfaceName":"EMAIL",
       "transactionID":"6130814352546724324134",
       "transactionDomain":"",
       "transactionType":"",
       "brand":"TST",
       "timestamp": "2018-06-16T15:40:47.225-07:00",
       "auditStep": "REPORT", 
       "message": "GET TOKEN PROCESS SUCCESS",
       "payload":"<?xml version="1.0" encoding="UTF-8"?>
    <RequestActivityOutput><Header>HTTP/1.1 200 OK
    Content-Type: application/json; charset=utf-8
    Date: Wed, 13 Jun 2018 22:40:47 GMT
    cache-control: no-cache, no-store, must-revalidate
    connection: close
    expires: 0
    x-powered-by: ASP.NET
    set-cookie: ASP.NET_SessionId=fdsafsaf40pu2mgwtzwm; path=/; secure; HttpOnly
    set-cookie: xyz_adm_adm_org_user_language_code=en; expires=Tue, 11-Sep-2018 22:40:48 GMT; path=/
    set-cookie: xyz_adm_adm_org_user_locale_code=en-US; expires=Tue, 11-Sep-2018 22:40:48 GMT; path=/
    set-cookie: xyz_adm_adm_org_user_time_zone_id=; expires=Tue, 11-Sep-2018 22:40:48 GMT; path=/
    pragma: no-cache
    </Header><statusLine><httpVersion>HTTP/1.1</httpVersion><statusCode>200</statusCode><reasonPhrase>OK</reasonPhrase></statusLine><binaryContent>eyJhY2Nlc3NfdG9rZW4iOiJBQUVBQUZvS0Z1emFXOWtSbmhtemtrMVNESmM3bWxhZDU0TWZjMlM4aVZKeE1SV2I4YklkYzVBTk9KZFN3WUpER3MwX2pGYXZWaGRVQ3dOUGZnelpmWFRhVU5gfdsRl9KOXFNUUxzYncwSHc1ZEdlcm90azh3RXNtSWpmY0V4bFRRNndiRnBJNEVCTmRlWUtVUVpkUXJWRldxVXNNSHVISTg4OVhFaHFxU1dnOWFxbU04cE1tWkM4OWVMbkZHczJpdi0yUEJ2cjhETWgyQ2lyVXYxTlM5X1F3RmtQdU1GYXo3cloya1B4RnBTX2lGOGNGdjktSFpzYThVWUo0TFBxenlsZWFFMk1oTzh3bDAzUTdsbnFwekNkNUh1YlpGbWtvUTAzWkl2eXRUTnNkazlaUDc5dWxdsfgdsOUdtQ1daVWFucnNjUUw0R256Z3JEcy03N3BpVF9uOG5WYmg5eGtBUUFBQUFFQUFJdXhQRzVfMmhyaE1wQzBJNXhNQnJMOHkxeDBLLUlSSEVFbkRSMUR6LWd5Qi1XZmU1QnFFQU1PeC1kQmFPalhXS0hWcWd2dHFYQTVDV1BWTlRHY0RlUVlVOTk3eEpdsgfFHUWlOSEJ1U3llNlFxWnJYUGRCYVlZbHY0OXZGYUVac29UQ3Z0NlF1QjcwTTNtUnotWTgxUmRYemdpbnNOMFBlYXVmWnBqcjREZ0FkcDlKaE81UmxQM3hFQjlhbTd0VHgyb1FWbHN0bEk2QTA5ZjhDVVp2bm8zZmwySHhzX2NZTnhGMDhaY20xVldDcGQzSE4tc1lJeUVTQmtZVHU5M25WNC16OTNkWmJyVDExZnFoWmFKaFBFb1ZtaHpqYzdqZW1lVWNuT3FHRU1OcEZvZXNickY5VzJsU0V1aFJEZnh4SVRgdsJJeEdDbmx0YjR0UkFyRTRiS0NuYkR3bFEycmpGX1kyT1hJQXJXbWV2TzdOUEdBN2lLQk8wZEhycVY4V29nVmxSUmc1d1lFVW9MbjFobTNGczF5VG1CZEhRQW1jRHJ5Rl9CWTZ5YmxPbmVsdnQyY05hcjVSMld0XzJiY3JQQnFGQnRMd3RCYUh1TWFDOGVSZyIsInRva2VuX3R5cGUiOiJiZWFyZXIiLCJleHBpcmVzX2luIjoyODgwMCwicmVmcmVzaF90b2tlbiI6IjJSdzEhSUFBQUFMU2RDVXlxN1M1aWpDRDg2TzlOcy1XSjBSZUUxeUZYbGNwV0p0TEVEcmpiY1FBQUFBRWNNb3gxd2UyOVlCM1BzcXJzRDdQZmptbWV3MXFGV0tXQl9TM0Y1ODBXUVYxRkhGOW5NQXY5ZnFjeC1vbDg4ZXR6NlhGZERYd1FIOXlQQi1YSFkyZ3BYMmUxdnAtbzdDZzRUQfdskMjAtajBSNEU0Y3BwYVpaaUdxR1NrVUxHRjdoenJ5ZWs1UF9xSG9FMEppRnBGa1duIn0=</binaryContent><asciiContent>{"access_token":"AAEAAFoKFuzaW9kRnhmzkk1SDJc7mlad54Mfc2S8iVJxMRWb8bIdc5ANOJdSwYJDGs0_jFavVhdUCwNPfgzZfXTaUNZJHAF_J9qMgfds0Hw5dGerotk8wEsmIjfcExlTQ6wbFpI4EBNdeYKUQZdQrVFWqUsMHuHI889XEhqqSWg9aqmM8pMmZC89eLnFGs2iv-2PBvr8DMh2CirUv1NS9_QwFkPuMFaz7rZ2kPxFpS_iF8cFv9-HZsa8UYJ4LPqzyleaE2MhO8wl03Q7lnqpzCd5HubZFmkoQ03ZIvytTNsdk9ZP79ulg9GmgfdsWZUanrscQL4GnzgrDs-77piT_n8nVbh9xkFDSAAAAEAAIuxPG5_2hrhMpC0I5xMBrL8y1x0K-IRHEEnDR1Dz-gyB-Wfe5BqEAMOx-dBaOjXWKHVqgvtqXA5CWPVNTGcDeQYU997xJuwt0tqGQiNHBuSye6QqZrXPdBaYYlv49vFaEZsoTCvt6QuB70M3mRz-Y81RdXzginsN0PeaufZpjr4DgAdp9JhO5RlP3xEB9am7tTx2oQVlstlI6A09f8CUZvno3fl2Hxs_cYNxF08Zcm1VWCpd3HN-fdsYTu93nV4-z93dZbrT11fqhZaJhPEoVmhzjc7jemeUcnOqGEMNpFoesbrF9W2lSEuhRDfxxITcbrIxGCnltb4tRArE4bKCnbDwlQ2rjF_Y2OXIArWmevO7NPGA7iKBO0dHrqV8WogVlRRg5wYEUoLn1hm3Fs1yTmBdHQAmcDryF_BY6yblOnelvt2cNar5R2Wt_2bcrPBqFBtLwtBaHuMaC8eRg","token_type":"bearer","expires_in":28800,"refresh_token":"2Rw1!IAAAALSdCUyq7S5ijCD86O9Ns-WJ0ReE1yFXlcpWJtLEDrjbcQAAAAEcMox1we29YB3PsqrsD7Pfjmmew1qFWKWB_S3F580WQV1FHF9nMAv9fqcx-ol88etz6XFdDXwQH9yPB-XHY2gpX2e1vp-o7Cg4TBDd20-j0R4E4cppaZZiGqGSkULGF7hzrygfdsk5P_qHoE0JiFpFkWn"}</asciiContent><Headers><Content-Type>application/json; charset=utf-8</Content-Type><Date>Wed, 13 Jun 2018 22:40:47 GMT</Date></Headers><DynamicHeaders><Header><Name>x-powered-by</Name><Value>ASP.NET</Value></Header></DynamicHeaders></RequestActivityOutput>",
       "error":{
         "errorType":"",
         "severity":"",
         "errorCode":"",
         "errorMessage":"",
         "stackTrace":"",
         "fullClass":"",
         "class":"",
         "processStack":"",
    	 "engineName":"",
    	 "processID":"",
    	 "restartedFromCheckpoint":""
       },
        "AdditionalInformation": {
          "name": "NUMBER",
          "value": "084325672"
       }
}

That's an interesting question. As shown, it's not even close to being valid JSON. The "payload" element in the JSON contains unescaped double quotes and embedded JSON. But then I see a whole bunch of &lt; entities in it. Can you edit your post, select the JSON and click on </> in the toolbar above the composition window?

If it really looks like that you will have to cleave out the payload before using a json filter. That could be done using a grok to capture the trailing [^>]+$, followed by a gsub to remove it. Then a dissect to split the payload away. Then rejoin the grok capture to the first dissect capture and send that through a json filter. Then start mutating the payload to be valid XML. But I would like to see what the JSON actually looks like.

That helped out a little, however it's still not in the same format as the original json.

Let's try something different. Insert lines containing ~~~ before and after the JSON.

OK, updated. Never knew about that trick.

    # Capture the JSON after payload, then remove it
    grok { match => { "message" => '>",(?<endOfJson>[^>]+)$' } }
    mutate { gsub => [ "message", ">[^>]+$", "" ] }

    # Split the remainder into the initial JSON and the payload
    dissect { mapping => { "message" => '%{startOfJson}"payload":"%{payload}' } }

    # Fix up the XML. I am mystified where that >> comes from, but get rid of it!
    mutate { gsub => [ "payload", "&lt;", "<", "payload", "$", ">" , "payload", ">>", ">" ] }

    # Parse the XML
    xml { source => "payload" store_xml => true target => "theXML" force_array => false }

    # Create some valid json and parse it. Use target in json filter?
    mutate { add_field => { "wholeJson" => "%{startOfJson}%{endOfJson}" } }
    json { source => "wholeJson" }

    #mutate { remove_field => [ "startOfJson", "endOfJson", "wholeJson", "payload" ] }

And if you do not like the format of DynamicHeaders then you can change it using this.

That is perfect, thank you Badger!

The one thing it does not get you is what I would have expected xpath to see as /RequestActivityOutput/Header/text(), which contains the response headers such as Set-Cookie. That's related to the non-existent >>, which I think is an artifact of the way rubydebug displays lines containing embedded CRs.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.