Filtering json with embedded xml


(Josh Speer) #1

I'm pulling json messages from a jms queue, where most of them filter properly. However there are specific messages with XML payload. What would be the best way to filter this type of message?

{
       "envCode": "RWCS",
       "appCode": "TIBCO",
       "type": "INFO",
       "hostName": "",
       "interfaceCode": "TIB_EXP.EMAIL",
       "interfaceName":"EMAIL",
       "transactionID":"6130814352546724324134",
       "transactionDomain":"",
       "transactionType":"",
       "brand":"TST",
       "timestamp": "2018-06-16T15:40:47.225-07:00",
       "auditStep": "REPORT", 
       "message": "GET TOKEN PROCESS SUCCESS",
       "payload":"<?xml version="1.0" encoding="UTF-8"?>
    <RequestActivityOutput><Header>HTTP/1.1 200 OK
    Content-Type: application/json; charset=utf-8
    Date: Wed, 13 Jun 2018 22:40:47 GMT
    cache-control: no-cache, no-store, must-revalidate
    connection: close
    expires: 0
    x-powered-by: ASP.NET
    set-cookie: ASP.NET_SessionId=fdsafsaf40pu2mgwtzwm; path=/; secure; HttpOnly
    set-cookie: xyz_adm_adm_org_user_language_code=en; expires=Tue, 11-Sep-2018 22:40:48 GMT; path=/
    set-cookie: xyz_adm_adm_org_user_locale_code=en-US; expires=Tue, 11-Sep-2018 22:40:48 GMT; path=/
    set-cookie: xyz_adm_adm_org_user_time_zone_id=; expires=Tue, 11-Sep-2018 22:40:48 GMT; path=/
    pragma: no-cache
    </Header><statusLine><httpVersion>HTTP/1.1</httpVersion><statusCode>200</statusCode><reasonPhrase>OK</reasonPhrase></statusLine><binaryContent>eyJhY2Nlc3NfdG9rZW4iOiJBQUVBQUZvS0Z1emFXOWtSbmhtemtrMVNESmM3bWxhZDU0TWZjMlM4aVZKeE1SV2I4YklkYzVBTk9KZFN3WUpER3MwX2pGYXZWaGRVQ3dOUGZnelpmWFRhVU5gfdsRl9KOXFNUUxzYncwSHc1ZEdlcm90azh3RXNtSWpmY0V4bFRRNndiRnBJNEVCTmRlWUtVUVpkUXJWRldxVXNNSHVISTg4OVhFaHFxU1dnOWFxbU04cE1tWkM4OWVMbkZHczJpdi0yUEJ2cjhETWgyQ2lyVXYxTlM5X1F3RmtQdU1GYXo3cloya1B4RnBTX2lGOGNGdjktSFpzYThVWUo0TFBxenlsZWFFMk1oTzh3bDAzUTdsbnFwekNkNUh1YlpGbWtvUTAzWkl2eXRUTnNkazlaUDc5dWxdsfgdsOUdtQ1daVWFucnNjUUw0R256Z3JEcy03N3BpVF9uOG5WYmg5eGtBUUFBQUFFQUFJdXhQRzVfMmhyaE1wQzBJNXhNQnJMOHkxeDBLLUlSSEVFbkRSMUR6LWd5Qi1XZmU1QnFFQU1PeC1kQmFPalhXS0hWcWd2dHFYQTVDV1BWTlRHY0RlUVlVOTk3eEpdsgfFHUWlOSEJ1U3llNlFxWnJYUGRCYVlZbHY0OXZGYUVac29UQ3Z0NlF1QjcwTTNtUnotWTgxUmRYemdpbnNOMFBlYXVmWnBqcjREZ0FkcDlKaE81UmxQM3hFQjlhbTd0VHgyb1FWbHN0bEk2QTA5ZjhDVVp2bm8zZmwySHhzX2NZTnhGMDhaY20xVldDcGQzSE4tc1lJeUVTQmtZVHU5M25WNC16OTNkWmJyVDExZnFoWmFKaFBFb1ZtaHpqYzdqZW1lVWNuT3FHRU1OcEZvZXNickY5VzJsU0V1aFJEZnh4SVRgdsJJeEdDbmx0YjR0UkFyRTRiS0NuYkR3bFEycmpGX1kyT1hJQXJXbWV2TzdOUEdBN2lLQk8wZEhycVY4V29nVmxSUmc1d1lFVW9MbjFobTNGczF5VG1CZEhRQW1jRHJ5Rl9CWTZ5YmxPbmVsdnQyY05hcjVSMld0XzJiY3JQQnFGQnRMd3RCYUh1TWFDOGVSZyIsInRva2VuX3R5cGUiOiJiZWFyZXIiLCJleHBpcmVzX2luIjoyODgwMCwicmVmcmVzaF90b2tlbiI6IjJSdzEhSUFBQUFMU2RDVXlxN1M1aWpDRDg2TzlOcy1XSjBSZUUxeUZYbGNwV0p0TEVEcmpiY1FBQUFBRWNNb3gxd2UyOVlCM1BzcXJzRDdQZmptbWV3MXFGV0tXQl9TM0Y1ODBXUVYxRkhGOW5NQXY5ZnFjeC1vbDg4ZXR6NlhGZERYd1FIOXlQQi1YSFkyZ3BYMmUxdnAtbzdDZzRUQfdskMjAtajBSNEU0Y3BwYVpaaUdxR1NrVUxHRjdoenJ5ZWs1UF9xSG9FMEppRnBGa1duIn0=</binaryContent><asciiContent>{"access_token":"AAEAAFoKFuzaW9kRnhmzkk1SDJc7mlad54Mfc2S8iVJxMRWb8bIdc5ANOJdSwYJDGs0_jFavVhdUCwNPfgzZfXTaUNZJHAF_J9qMgfds0Hw5dGerotk8wEsmIjfcExlTQ6wbFpI4EBNdeYKUQZdQrVFWqUsMHuHI889XEhqqSWg9aqmM8pMmZC89eLnFGs2iv-2PBvr8DMh2CirUv1NS9_QwFkPuMFaz7rZ2kPxFpS_iF8cFv9-HZsa8UYJ4LPqzyleaE2MhO8wl03Q7lnqpzCd5HubZFmkoQ03ZIvytTNsdk9ZP79ulg9GmgfdsWZUanrscQL4GnzgrDs-77piT_n8nVbh9xkFDSAAAAEAAIuxPG5_2hrhMpC0I5xMBrL8y1x0K-IRHEEnDR1Dz-gyB-Wfe5BqEAMOx-dBaOjXWKHVqgvtqXA5CWPVNTGcDeQYU997xJuwt0tqGQiNHBuSye6QqZrXPdBaYYlv49vFaEZsoTCvt6QuB70M3mRz-Y81RdXzginsN0PeaufZpjr4DgAdp9JhO5RlP3xEB9am7tTx2oQVlstlI6A09f8CUZvno3fl2Hxs_cYNxF08Zcm1VWCpd3HN-fdsYTu93nV4-z93dZbrT11fqhZaJhPEoVmhzjc7jemeUcnOqGEMNpFoesbrF9W2lSEuhRDfxxITcbrIxGCnltb4tRArE4bKCnbDwlQ2rjF_Y2OXIArWmevO7NPGA7iKBO0dHrqV8WogVlRRg5wYEUoLn1hm3Fs1yTmBdHQAmcDryF_BY6yblOnelvt2cNar5R2Wt_2bcrPBqFBtLwtBaHuMaC8eRg","token_type":"bearer","expires_in":28800,"refresh_token":"2Rw1!IAAAALSdCUyq7S5ijCD86O9Ns-WJ0ReE1yFXlcpWJtLEDrjbcQAAAAEcMox1we29YB3PsqrsD7Pfjmmew1qFWKWB_S3F580WQV1FHF9nMAv9fqcx-ol88etz6XFdDXwQH9yPB-XHY2gpX2e1vp-o7Cg4TBDd20-j0R4E4cppaZZiGqGSkULGF7hzrygfdsk5P_qHoE0JiFpFkWn"}</asciiContent><Headers><Content-Type>application/json; charset=utf-8</Content-Type><Date>Wed, 13 Jun 2018 22:40:47 GMT</Date></Headers><DynamicHeaders><Header><Name>x-powered-by</Name><Value>ASP.NET</Value></Header></DynamicHeaders></RequestActivityOutput>",
       "error":{
         "errorType":"",
         "severity":"",
         "errorCode":"",
         "errorMessage":"",
         "stackTrace":"",
         "fullClass":"",
         "class":"",
         "processStack":"",
    	 "engineName":"",
    	 "processID":"",
    	 "restartedFromCheckpoint":""
       },
        "AdditionalInformation": {
          "name": "NUMBER",
          "value": "084325672"
       }
}

#2

That's an interesting question. As shown, it's not even close to being valid JSON. The "payload" element in the JSON contains unescaped double quotes and embedded JSON. But then I see a whole bunch of &lt; entities in it. Can you edit your post, select the JSON and click on </> in the toolbar above the composition window?

If it really looks like that you will have to cleave out the payload before using a json filter. That could be done using a grok to capture the trailing [^>]+$, followed by a gsub to remove it. Then a dissect to split the payload away. Then rejoin the grok capture to the first dissect capture and send that through a json filter. Then start mutating the payload to be valid XML. But I would like to see what the JSON actually looks like.


(Josh Speer) #3

That helped out a little, however it's still not in the same format as the original json.


#4

Let's try something different. Insert lines containing ~~~ before and after the JSON.


(Josh Speer) #5

OK, updated. Never knew about that trick.


#6
    # Capture the JSON after payload, then remove it
    grok { match => { "message" => '>",(?<endOfJson>[^>]+)$' } }
    mutate { gsub => [ "message", ">[^>]+$", "" ] }

    # Split the remainder into the initial JSON and the payload
    dissect { mapping => { "message" => '%{startOfJson}"payload":"%{payload}' } }

    # Fix up the XML. I am mystified where that >> comes from, but get rid of it!
    mutate { gsub => [ "payload", "&lt;", "<", "payload", "$", ">" , "payload", ">>", ">" ] }

    # Parse the XML
    xml { source => "payload" store_xml => true target => "theXML" force_array => false }

    # Create some valid json and parse it. Use target in json filter?
    mutate { add_field => { "wholeJson" => "%{startOfJson}%{endOfJson}" } }
    json { source => "wholeJson" }

    #mutate { remove_field => [ "startOfJson", "endOfJson", "wholeJson", "payload" ] }

And if you do not like the format of DynamicHeaders then you can change it using this.


(Josh Speer) #7

That is perfect, thank you Badger!


#8

The one thing it does not get you is what I would have expected xpath to see as /RequestActivityOutput/Header/text(), which contains the response headers such as Set-Cookie. That's related to the non-existent >>, which I think is an artifact of the way rubydebug displays lines containing embedded CRs.


(system) #9

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.