Filtering json with embedded xml

jspeer · July 6, 2018, 4:36pm

I'm pulling json messages from a jms queue, where most of them filter properly. However there are specific messages with XML payload. What would be the best way to filter this type of message?

{
       "envCode": "RWCS",
       "appCode": "TIBCO",
       "type": "INFO",
       "hostName": "",
       "interfaceCode": "TIB_EXP.EMAIL",
       "interfaceName":"EMAIL",
       "transactionID":"6130814352546724324134",
       "transactionDomain":"",
       "transactionType":"",
       "brand":"TST",
       "timestamp": "2018-06-16T15:40:47.225-07:00",
       "auditStep": "REPORT", 
       "message": "GET TOKEN PROCESS SUCCESS",
       "payload":"&lt;?xml version="1.0" encoding="UTF-8"?>
    &lt;RequestActivityOutput>&lt;Header>HTTP/1.1 200 OK&amp;#xD;
    Content-Type: application/json; charset=utf-8&amp;#xD;
    Date: Wed, 13 Jun 2018 22:40:47 GMT&amp;#xD;
    cache-control: no-cache, no-store, must-revalidate&amp;#xD;
    connection: close&amp;#xD;
    expires: 0&amp;#xD;
    x-powered-by: ASP.NET&amp;#xD;
    set-cookie: ASP.NET_SessionId=fdsafsaf40pu2mgwtzwm; path=/; secure; HttpOnly&amp;#xD;
    set-cookie: xyz_adm_adm_org_user_language_code=en; expires=Tue, 11-Sep-2018 22:40:48 GMT; path=/&amp;#xD;
    set-cookie: xyz_adm_adm_org_user_locale_code=en-US; expires=Tue, 11-Sep-2018 22:40:48 GMT; path=/&amp;#xD;
    set-cookie: xyz_adm_adm_org_user_time_zone_id=; expires=Tue, 11-Sep-2018 22:40:48 GMT; path=/&amp;#xD;
    pragma: no-cache&amp;#xD;
    &lt;/Header>&lt;statusLine>&lt;httpVersion>HTTP/1.1&lt;/httpVersion>&lt;statusCode>200&lt;/statusCode>&lt;reasonPhrase>OK&lt;/reasonPhrase>&lt;/statusLine>&lt;binaryContent>eyJhY2Nlc3NfdG9rZW4iOiJBQUVBQUZvS0Z1emFXOWtSbmhtemtrMVNESmM3bWxhZDU0TWZjMlM4aVZKeE1SV2I4YklkYzVBTk9KZFN3WUpER3MwX2pGYXZWaGRVQ3dOUGZnelpmWFRhVU5gfdsRl9KOXFNUUxzYncwSHc1ZEdlcm90azh3RXNtSWpmY0V4bFRRNndiRnBJNEVCTmRlWUtVUVpkUXJWRldxVXNNSHVISTg4OVhFaHFxU1dnOWFxbU04cE1tWkM4OWVMbkZHczJpdi0yUEJ2cjhETWgyQ2lyVXYxTlM5X1F3RmtQdU1GYXo3cloya1B4RnBTX2lGOGNGdjktSFpzYThVWUo0TFBxenlsZWFFMk1oTzh3bDAzUTdsbnFwekNkNUh1YlpGbWtvUTAzWkl2eXRUTnNkazlaUDc5dWxdsfgdsOUdtQ1daVWFucnNjUUw0R256Z3JEcy03N3BpVF9uOG5WYmg5eGtBUUFBQUFFQUFJdXhQRzVfMmhyaE1wQzBJNXhNQnJMOHkxeDBLLUlSSEVFbkRSMUR6LWd5Qi1XZmU1QnFFQU1PeC1kQmFPalhXS0hWcWd2dHFYQTVDV1BWTlRHY0RlUVlVOTk3eEpdsgfFHUWlOSEJ1U3llNlFxWnJYUGRCYVlZbHY0OXZGYUVac29UQ3Z0NlF1QjcwTTNtUnotWTgxUmRYemdpbnNOMFBlYXVmWnBqcjREZ0FkcDlKaE81UmxQM3hFQjlhbTd0VHgyb1FWbHN0bEk2QTA5ZjhDVVp2bm8zZmwySHhzX2NZTnhGMDhaY20xVldDcGQzSE4tc1lJeUVTQmtZVHU5M25WNC16OTNkWmJyVDExZnFoWmFKaFBFb1ZtaHpqYzdqZW1lVWNuT3FHRU1OcEZvZXNickY5VzJsU0V1aFJEZnh4SVRgdsJJeEdDbmx0YjR0UkFyRTRiS0NuYkR3bFEycmpGX1kyT1hJQXJXbWV2TzdOUEdBN2lLQk8wZEhycVY4V29nVmxSUmc1d1lFVW9MbjFobTNGczF5VG1CZEhRQW1jRHJ5Rl9CWTZ5YmxPbmVsdnQyY05hcjVSMld0XzJiY3JQQnFGQnRMd3RCYUh1TWFDOGVSZyIsInRva2VuX3R5cGUiOiJiZWFyZXIiLCJleHBpcmVzX2luIjoyODgwMCwicmVmcmVzaF90b2tlbiI6IjJSdzEhSUFBQUFMU2RDVXlxN1M1aWpDRDg2TzlOcy1XSjBSZUUxeUZYbGNwV0p0TEVEcmpiY1FBQUFBRWNNb3gxd2UyOVlCM1BzcXJzRDdQZmptbWV3MXFGV0tXQl9TM0Y1ODBXUVYxRkhGOW5NQXY5ZnFjeC1vbDg4ZXR6NlhGZERYd1FIOXlQQi1YSFkyZ3BYMmUxdnAtbzdDZzRUQfdskMjAtajBSNEU0Y3BwYVpaaUdxR1NrVUxHRjdoenJ5ZWs1UF9xSG9FMEppRnBGa1duIn0=&lt;/binaryContent>&lt;asciiContent>{"access_token":"AAEAAFoKFuzaW9kRnhmzkk1SDJc7mlad54Mfc2S8iVJxMRWb8bIdc5ANOJdSwYJDGs0_jFavVhdUCwNPfgzZfXTaUNZJHAF_J9qMgfds0Hw5dGerotk8wEsmIjfcExlTQ6wbFpI4EBNdeYKUQZdQrVFWqUsMHuHI889XEhqqSWg9aqmM8pMmZC89eLnFGs2iv-2PBvr8DMh2CirUv1NS9_QwFkPuMFaz7rZ2kPxFpS_iF8cFv9-HZsa8UYJ4LPqzyleaE2MhO8wl03Q7lnqpzCd5HubZFmkoQ03ZIvytTNsdk9ZP79ulg9GmgfdsWZUanrscQL4GnzgrDs-77piT_n8nVbh9xkFDSAAAAEAAIuxPG5_2hrhMpC0I5xMBrL8y1x0K-IRHEEnDR1Dz-gyB-Wfe5BqEAMOx-dBaOjXWKHVqgvtqXA5CWPVNTGcDeQYU997xJuwt0tqGQiNHBuSye6QqZrXPdBaYYlv49vFaEZsoTCvt6QuB70M3mRz-Y81RdXzginsN0PeaufZpjr4DgAdp9JhO5RlP3xEB9am7tTx2oQVlstlI6A09f8CUZvno3fl2Hxs_cYNxF08Zcm1VWCpd3HN-fdsYTu93nV4-z93dZbrT11fqhZaJhPEoVmhzjc7jemeUcnOqGEMNpFoesbrF9W2lSEuhRDfxxITcbrIxGCnltb4tRArE4bKCnbDwlQ2rjF_Y2OXIArWmevO7NPGA7iKBO0dHrqV8WogVlRRg5wYEUoLn1hm3Fs1yTmBdHQAmcDryF_BY6yblOnelvt2cNar5R2Wt_2bcrPBqFBtLwtBaHuMaC8eRg","token_type":"bearer","expires_in":28800,"refresh_token":"2Rw1!IAAAALSdCUyq7S5ijCD86O9Ns-WJ0ReE1yFXlcpWJtLEDrjbcQAAAAEcMox1we29YB3PsqrsD7Pfjmmew1qFWKWB_S3F580WQV1FHF9nMAv9fqcx-ol88etz6XFdDXwQH9yPB-XHY2gpX2e1vp-o7Cg4TBDd20-j0R4E4cppaZZiGqGSkULGF7hzrygfdsk5P_qHoE0JiFpFkWn"}&lt;/asciiContent>&lt;Headers>&lt;Content-Type>application/json; charset=utf-8&lt;/Content-Type>&lt;Date>Wed, 13 Jun 2018 22:40:47 GMT&lt;/Date>&lt;/Headers>&lt;DynamicHeaders>&lt;Header>&lt;Name>x-powered-by&lt;/Name>&lt;Value>ASP.NET&lt;/Value>&lt;/Header>&lt;/DynamicHeaders>&lt;/RequestActivityOutput>",
       "error":{
         "errorType":"",
         "severity":"",
         "errorCode":"",
         "errorMessage":"",
         "stackTrace":"",
         "fullClass":"",
         "class":"",
         "processStack":"",
    	 "engineName":"",
    	 "processID":"",
    	 "restartedFromCheckpoint":""
       },
        "AdditionalInformation": {
          "name": "NUMBER",
          "value": "084325672"
       }
}

Badger · July 6, 2018, 4:54pm

That's an interesting question. As shown, it's not even close to being valid JSON. The "payload" element in the JSON contains unescaped double quotes and embedded JSON. But then I see a whole bunch of < entities in it. Can you edit your post, select the JSON and click on </> in the toolbar above the composition window?

If it really looks like that you will have to cleave out the payload before using a json filter. That could be done using a grok to capture the trailing [^>]+$, followed by a gsub to remove it. Then a dissect to split the payload away. Then rejoin the grok capture to the first dissect capture and send that through a json filter. Then start mutating the payload to be valid XML. But I would like to see what the JSON actually looks like.

jspeer · July 6, 2018, 5:00pm

That helped out a little, however it's still not in the same format as the original json.

Badger · July 6, 2018, 5:02pm

Let's try something different. Insert lines containing ~~~ before and after the JSON.

jspeer · July 6, 2018, 5:05pm

OK, updated. Never knew about that trick.

Badger · July 6, 2018, 5:32pm

    # Capture the JSON after payload, then remove it
    grok { match => { "message" => '>",(?<endOfJson>[^>]+)$' } }
    mutate { gsub => [ "message", ">[^>]+$", "" ] }

    # Split the remainder into the initial JSON and the payload
    dissect { mapping => { "message" => '%{startOfJson}"payload":"%{payload}' } }

    # Fix up the XML. I am mystified where that >> comes from, but get rid of it!
    mutate { gsub => [ "payload", "&lt;", "<", "payload", "$", ">" , "payload", ">>", ">" ] }

    # Parse the XML
    xml { source => "payload" store_xml => true target => "theXML" force_array => false }

    # Create some valid json and parse it. Use target in json filter?
    mutate { add_field => { "wholeJson" => "%{startOfJson}%{endOfJson}" } }
    json { source => "wholeJson" }

    #mutate { remove_field => [ "startOfJson", "endOfJson", "wholeJson", "payload" ] }

And if you do not like the format of DynamicHeaders then you can change it using this.

jspeer · July 6, 2018, 6:39pm

That is perfect, thank you Badger!

Badger · July 6, 2018, 7:03pm

The one thing it does not get you is what I would have expected xpath to see as /RequestActivityOutput/Header/text(), which contains the response headers such as Set-Cookie. That's related to the non-existent >>, which I think is an artifact of the way rubydebug displays lines containing embedded CRs.

system · August 3, 2018, 7:03pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Convert json to xml Logstash	17	2981	November 14, 2017
XML into JSON value Logstash	6	220	March 10, 2024
Parsing xml message inside json file Logstash	3	1146	August 5, 2019
Parsing filter for nested json Logstash	2	342	July 27, 2020
Parse a json file that includes an xml Logstash	8	558	December 16, 2022

Filtering json with embedded xml

Related topics