Struggling to parse JSON key/value pairs

Hi,
I'm struggling to parse a JSON Log4J2 output as the context map (MDC) looks to be stored as an array of JSON (i think!).
Whats the best way of taking the KV pairs out of this:

{"timeMillis":1456156644547,"thread":"main","level":"DEBUG","loggerName":"DatabaseChecker","message":"No Errors","endOfBatch":false,"loggerFqcn":"org.apache.logging.slf4j.Log4jLogger","contextMap":[{"key":"cmap_x-forward","value":"1.1.1.1"},{"key":"cmap_request_uri","value":"/test/URL/endpoint"},{"key":"cmap_server_name","value":"test.server.1"},{"key":"cmap_principal","value":"user1"},{"key":"cmap_user_company","value":"Test"},{"key":"cmap_user_org_type","value":"A"},{"key":"cmap_session_id","value":"1111111111111111111111"},{"key":"cmap_user_email","value":"user@user.com"},{"key":"cmap_http_method","value":"POST"},{"key":"cmap_instance_name","value":"instance1"},{"key":"cmap_http_headers","value":"testingheaders"},{"key":"cmap_remote_address","value":"127.0.0.1"}]}

It's the {"key":"cmap_remote_address","value":"127.0.0.1"} array that has me confused!

kv {
source => "contextMap"
}

didn't seem to work for me, it threw:
:exception=>java.lang.ClassCastException: org.jruby.RubyArray cannot be cast to org.jruby.RubyIO, :level=>:warn}

Has anyone parsed this kind of input before? If it helps I'm using Log4J2 with these options:
<JSONLayout eventEol="true" compact="true" properties="true" />

I've been trying to get the logstash json event to work but i was having trouble with the MDC/KeyValue pairs throwing unmodifiable Map errors. If I can get an output format that works with logstash I'll be able to start using an S3 input :smile:

If anyone has any tips or ideas on this I'd be very grateful :blush:

Use a json filter or codec to parse these events, not kv.

Thanks Magnus,
Not sure if I'm still doing it wrong but I got this error:
{:timestamp=>"2016-02-22T20:43:43.681000+0000", :message=>"Trouble parsing json", :source=>"contextMap", :raw=>[{"key"=>"cmap_x-forward", "value"=>"1.1.1.1"}, {"key"=>"cmap_request_uri", "value"=>"/test/URL/endpoint"}, {"key"=>"cmap_server_name", "value"=>"test.server.1"}, {"key"=>"cmap_principal", "value"=>"user1"}, {"key"=>"cmap_user_company", "value"=>"Test"}, {"key"=>"cmap_user_org_type", "value"=>"A"}, {"key"=>"cmap_session_id", "value"=>"1111111111111111111111"}, {"key"=>"cmap_user_email", "value"=>"user@user.com"}, {"key"=>"cmap_http_method", "value"=>"POST"}, {"key"=>"cmap_instance_name", "value"=>"instance1"}, {"key"=>"cmap_http_headers", "value"=>"testingheaders"}, {"key"=>"cmap_remote_address", "value"=>"127.0.0.1"}], :exception=>java.lang.ClassCastException: org.jruby.RubyArray cannot be cast to org.jruby.RubyIO, :level=>:warn}

I stripped the filter config down to:
filter {
json {
source => "contextMap"
}
}

I did just notice that this test system is running logstash 1.5.5, not sure whether that might be causing me issues? I have another cluster set up if you think I should try a newer version?

Well, this works in Logstash 2.2.1:

$ cat test.config
input { stdin { } }
output { stdout { codec => rubydebug } }
filter { json { source => "message" } }
$ cat data
{"timeMillis":1456156644547,"thread":"main","level":"DEBUG","loggerName":"DatabaseChecker","message":"No Errors","endOfBatch":false,"loggerFqcn":"org.apache.logging.slf4j.Log4jLogger","contextMap":[{"key":"cmap_x-forward","value":"1.1.1.1"},{"key":"cmap_request_uri","value":"/test/URL/endpoint"},{"key":"cmap_server_name","value":"test.server.1"},{"key":"cmap_principal","value":"user1"},{"key":"cmap_user_company","value":"Test"},{"key":"cmap_user_org_type","value":"A"},{"key":"cmap_session_id","value":"1111111111111111111111"},{"key":"cmap_user_email","value":"user@user.com"},{"key":"cmap_http_method","value":"POST"},{"key":"cmap_instance_name","value":"instance1"},{"key":"cmap_http_headers","value":"testingheaders"},{"key":"cmap_remote_address","value":"127.0.0.1"}]}
$ /opt/logstash/bin/logstash -f test.config < data
Settings: Default pipeline workers: 2
Logstash startup completed
{
       "message" => "No Errors",
      "@version" => "1",
    "@timestamp" => "2016-02-22T20:55:50.973Z",
          "host" => "hallonet",
    "timeMillis" => 1456156644547,
        "thread" => "main",
         "level" => "DEBUG",
    "loggerName" => "DatabaseChecker",
    "endOfBatch" => false,
    "loggerFqcn" => "org.apache.logging.slf4j.Log4jLogger",
    "contextMap" => [
        [ 0] {
              "key" => "cmap_x-forward",
            "value" => "1.1.1.1"
        },
        [ 1] {
              "key" => "cmap_request_uri",
            "value" => "/test/URL/endpoint"
        },
        [ 2] {
              "key" => "cmap_server_name",
            "value" => "test.server.1"
        },
        [ 3] {
              "key" => "cmap_principal",
            "value" => "user1"
        },
        [ 4] {
              "key" => "cmap_user_company",
            "value" => "Test"
        },
        [ 5] {
              "key" => "cmap_user_org_type",
            "value" => "A"
        },
        [ 6] {
              "key" => "cmap_session_id",
            "value" => "1111111111111111111111"
        },
        [ 7] {
              "key" => "cmap_user_email",
            "value" => "user@user.com"
        },
        [ 8] {
              "key" => "cmap_http_method",
            "value" => "POST"
        },
        [ 9] {
              "key" => "cmap_instance_name",
            "value" => "instance1"
        },
        [10] {
              "key" => "cmap_http_headers",
            "value" => "testingheaders"
        },
        [11] {
              "key" => "cmap_remote_address",
            "value" => "127.0.0.1"
        }
    ]
}
Logstash shutdown completed

Thanks Magnus,
I really appreciate you testing it out :slightly_smiling:
I've tried the following:
ubuntu@ip-10-91-161-82:~$ cat s3.conf input { stdin {type => "s3_test_sg" }} output { elasticsearch { hosts => ["127.0.0.1:9200"] index => "logstash-%{type}-%{+YYYY.MM.dd}" } } filter { json { source => "message" } } ubuntu@ip-10-91-161-82:~$ cat jsontest.log {"timeMillis":1456156644547,"thread":"main","level":"DEBUG","loggerName":"DatabaseChecker","message":"No Errors","endOfBatch":false,"loggerFqcn":"org.apache.logging.slf4j.Log4jLogger","contextMap":[{"key":"cmap_x-forward","value":"1.1.1.1"},{"key":"cmap_request_uri","value":"/test/URL/endpoint"},{"key":"cmap_server_name","value":"test.server.1"},{"key":"cmap_principal","value":"user1"},{"key":"cmap_user_company","value":"Test"},{"key":"cmap_user_org_type","value":"A"},{"key":"cmap_session_id","value":"1111111111111111111111"},{"key":"cmap_user_email","value":"user@user.com"},{"key":"cmap_http_method","value":"POST"},{"key":"cmap_instance_name","value":"instance1"},{"key":"cmap_http_headers","value":"testingheaders"},{"key":"cmap_remote_address","value":"127.0.0.1"}]}
with 1.5.4 and 2.2.2, when checking in Kibana it looks like this:

Is there a way to get it to process the context maps like:
"cmap_x-forward" => "1.1.1.1",
"cmap_request_uri" => "/test/URL/endpoint"

etc.? I'm starting to think I may need to learn a bit of ruby to process this format :slight_smile:

I don't think Kibana supports array of objects. Yes, you may need to use a ruby filter to process the data further.

Thanks for your help with this, I'll post back if I can figure out the ruby for it.......I'm very much a ruby noob though so it might take a while!

I think I've managed to get it working now, I'm sure there are neater ways of doing this but here is what I have come up with:
filter{ ruby { code => " temp = event['contextMap'] event['contextMap'] = temp.join('') filter{ ruby { code => " temp = event['contextMap'] event['contextMap'] = temp.join('') " } mutate { gsub => [ 'contextMap', '[\"]', '^' ] } ruby { code => " temp = event['contextMap'] event['contextMap'] = temp.gsub('^key^=>', ' ').gsub(', ^value^=>^', '=^') " } mutate { gsub => [ 'contextMap', '\^', '"' ] } mutate { gsub => [ 'contextMap', '[{}]', '' ] } mutate { gsub => [ 'contextMap', '\"', '' ] } kv{ source => "contextMap"}

This outputted:
ubuntu@ip-10-91-161-82:~/logstash-2.2.2/bin$ ./logstash -f ~/s3.conf < ~/jsontest.log Settings: Default pipeline workers: 2 Logstash startup completed { "timeMillis" => 1456156644547, "thread" => "main", "level" => "DEBUG", "loggerName" => "DatabaseChecker", "message" => "No Errors", "endOfBatch" => false, "loggerFqcn" => "org.apache.logging.slf4j.Log4jLogger", "contextMap" => " cmap_x-forward=1.1.1.1 cmap_request_uri=/test/URL/endpoint cmap_server_name=test.server.1 cmap_principal=user1 cmap_user_company=Test cmap_user_org_type=A cmap_session_id=1111111111111111111111 cmap_user_email=user@user.com cmap_http_method=POST cmap_instance_name=instance1 cmap_http_headers=testingheaders cmap_remote_address=127.0.0.1", "@version" => "1", "@timestamp" => "2016-02-23T11:51:41.003Z", "type" => "s3_test_sg", "host" => "ip-10-91-161-82", "cmap_x-forward" => "1.1.1.1", "cmap_request_uri" => "/test/URL/endpoint", "cmap_server_name" => "test.server.1", "cmap_principal" => "user1", "cmap_user_company" => "Test", "cmap_user_org_type" => "A", "cmap_session_id" => "1111111111111111111111", "cmap_user_email" => "user@user.com", "cmap_http_method" => "POST", "cmap_instance_name" => "instance1", "cmap_http_headers" => "testingheaders", "cmap_remote_address" => "127.0.0.1" } Logstash shutdown completed

Is this likely to be quite CPU intensive if it has to do all this processing?

I've added some extra config to account for a stack trace if an exception is thrown, I have got it to parse it as KV pairs, or as JSON. Is there a way to put the parsed JSON into a nested format?
In this case, version is a field inside thrown.extendedStackTrace2 but it's showing as simply "version", I'd like to be able to get it into:
thrown.extendedStackTrace2.version

Do you know if that is possible?

The input is used was:
{"timeMillis":1456230423967,"thread":"main","level":"DEBUG","loggerName":"DatabaseChecker","message":"error thrown","thrown":{"localizedMessage":"testError","message":"testError","name":"com.test.armoury.exception.checked.TechnicalException","extendedStackTrace":[{"class":"DatabaseChecker","method":"main","file":"DatabaseChecker.java","line":83,"exact":true,"location":"classes/","version":"?"}]},"endOfBatch":false,"loggerFqcn":"org.apache.logging.slf4j.Log4jLogger","contextMap":[{"key":"cmap_x-forward","value":"1.1.1.1"},{"key":"cmap_request_uri","value":"/test/URL/endpoint"},{"key":"cmap_server_name","value":"test.server.1"},{"key":"cmap_principal","value":"user1"},{"key":"cmap_user_company","value":"Test Company"},{"key":"cmap_user_org_type","value":"A"},{"key":"cmap_session_id","value":"1111111111111111111111"},{"key":"cmap_user_email","value":"test@test.com"},{"key":"cmap_http_method","value":"POST"},{"key":"cmap_instance_name","value":"instance1"},{"key":"cmap_http_headers","value":"testingheaders"},{"key":"cmap_remote_address","value":"127.0.0.1"}]}

And my Filter is now:
filter{ ruby { code => " temp = event['contextMap'] event['contextMap'] = temp.join('') " } mutate { gsub => ['contextMap', '[\"]', '^'] } ruby { code => " temp = event['contextMap'] event['contextMap'] = temp.gsub('^key^=>', ' ').gsub(', ^value^=>^', '=^') " } mutate { gsub => ['contextMap', '\^', '"'] } mutate { gsub => ['contextMap', '[{}]', ''] } mutate { gsub => ['contextMap', '\"', ''] } if [thrown]{ ruby { code => " temp = event['thrown']['extendedStackTrace'] event['thrown']['extendedStackTrace']=temp.join('').gsub('=>', '=') " } } mutate { gsub => ['[thrown][extendedStackTrace]', '[\"]', '^'] } mutate { gsub => ['[thrown][extendedStackTrace]', '\^', '"'] } ruby { code => " temp = event['thrown']['extendedStackTrace'] event['thrown']['extendedStackTrace2'] = temp " } mutate { gsub => ['[thrown][extendedStackTrace2]', '=', ':'] } mutate { gsub => ['[thrown][extendedStackTrace]', '[{}]', ''] } mutate { gsub => ['[thrown][extendedStackTrace]', '\"', ''] } json { source => '[thrown][extendedStackTrace2]'} kv{ source => "contextMap"} }

Thanks for your help with this @magnusbaeck :smiley:

1 Like

You can use the json filter to parse a JSON string into subfields of another field by using the target option. In your case maybe you want to rename thrown.extendedStackTrace and thrown.extendedStackTrace2 to get them out of the way so that you can parse the JSON string into thrown.extendedStackTrace.

Hi Magnus,
I was just creating those duplicates so I could play around with a KV version and JSON version.

I've tidied it up a bit now so it parses into nested JSON (which displays as thrown.extendedStackTrace.class in kibana), thanks for the tip on "target"!

Would it be useful if I put this in a gist somewhere? I've not made one before but it could be useful for people using log4j2? I'm very new to ruby so there are probably better ways of substituting strings etc, if it's in a gist then ruby people would be free to improve it :slight_smile:

I'll hopefully be testing this out with some real logs tomorrow to see how it processes them.........hopefully it all works perfectly :smiley:

1 Like

OK, so the real logs work well, unless there is no value in the contextMap, for example:
{"timeMillis":1456403845087,"thread":"http-apr-8280-exec-7","level":"DEBUG","loggerName":"com.test.util.ApplicationCache","message":"TEST Code=TEST_CODE","endOfBatch":false,"loggerFqcn":"org.apache.logging.slf4j.Log4jLogger","contextMap":[{"key":"user_company"},{"key":"server_name","value":"localhost"},{"key":"http_method","value":"GET"},{"key":"session_id","value":"98FA8D84EA6B713A9E35F3115EA0C9B0"},{"key":"x-forward"},{"key":"principal"},{"key":"http_headers","value":"host=localhost:8280;user-agent=Mozilla/5.0 (Windows NT 6.1; WOW64; rv:44.0) Gecko/20100101 Firefox/44.0;accept=text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8;accept-language=en-US,en;q=0.5;accept-encoding=gzip, deflate;cookie=Timezone.selectedCountry=UK; testcompanyfontsize=small; testcompanyhighcontrast=off; basketpopfollow=true; disableMultiItems=false; pauseOffers=false; JSESSIONID=98FA8D84EA6B713A9E35F3115EA0C9B0; basketpop=false; __utma=111872281.1205185189.1452769986.1455890844.1455897324.5; __utmz=111872281.1452769986.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none);connection=keep-alive"},{"key":"remote_address","value":"127.0.0.1"},{"key":"instance_name","value":"test.local"},{"key":"user_org_type"},{"key":"user_email"},{"key":"request_uri","value":"/test/login.do"}]}

It give a JSON parse failure where the nested json has a key but no value, see user_email or user_org_type for an example. As (i think) this should still be valid JSON I thought it would create a field with no value or just drop it, if thats not possible I can make a code change to initialise all the values in the contextMap as blank strings.....but would prefer to avoid that if possible :smiley:

The string above is indeed valid JSON. If Logstash doesn't like it please post the full error message.

I think it must be something to do with how I'm transforming the data from:
{"key": "test", "value":"val1"}
to
{"test":"val1"}

when there is no value it returns:
{"instance_name":"local.instance","user_org_type","user_email","request_uri":"/test/URL"}

Which is no longer valid JSON. I'll look at making a code change to either set them to "" if the actual value isn't set, or see about not adding it to the MDC if the value is null.

Hi Stephen,

I am faced with the same issue, I'm trying to break apart the contextMap, did you find a different solution for this? How resource intensive is your solution?

I think i will also plug into the JMX to see how logstash is performing.

I want to create metrics for instance using the number of GET, POST, UPDATE, etc... that i am logging with Log4jJ Kafka appender.

I would very much appreciate any further insights that you might have collected in the past months.

hi @webmutation
I haven't been able to look at this for a while as it's been really busy at work.
This was the filter I was using:
filter{ if [contextMap] { ruby { code => " temp = event['contextMap'] event['contextMap'] = temp.join('') " } mutate { gsub => ['contextMap', '[\"]', '^'] } ruby { code => " temp = event['contextMap'] event['contextMap'] = temp.gsub('^key^=>', '').gsub(', ^value^=>^', ':^') " } mutate { gsub => ['contextMap', '\^', '"'] gsub => ['contextMap', '}{', ','] } json { source => "contextMap" target => "contextMap"} } if [thrown]{ ruby { code => " temp = event['thrown']['extendedStackTrace'] event['thrown']['extendedStackTrace'] = temp.join('') " } mutate { gsub => ['[thrown][extendedStackTrace]', '=>', ':'] } json { source => '[thrown][extendedStackTrace]' target => '[thrown][extendedStackTrace]'} } date { match => [ "timeMillis", "UNIX_MS" ] timezone => "Europe/London" } }

Hopefully that's of some use to you, it did have issues when there was an MDC field with no value, I made a code change to get rid of these, perhaps someone with Ruby experience would be able to write something into the filter to handle those issues.

I think it is quite resource hungry, again someone may be able to make it better....I've never used Ruby before :blush:

1 Like

Thank you very much Stephen!

I will pick up on where you left of and try to contribute. I am new to ruby as well :blush: but your work will provide me a strong base to work on.