Trouble replacing leading/trailing whitespace in nested json

Hey, folks

Trying to figure out a clean way to solve this problem

I have nested json coming in from an SQS queue, and I've been playing with mocking it up with a static file example and filebeat, which I know is not perfect, but has given me a chance to iterate cleanly without impacting the other pieces.

The json looks a bit like this:

{"city": "My town ","comments": "all is well","date":"2023-01-26T00:00:00.000Z","email":"myuser@example.com","firstname":"User ","lastname":"Name"}

There may be syntax errors in the above, because I did mock this up by hand (but not in my actual tests).

Basically, the issue is, the upstream app breaks when it hits extra whitespace characters that lead/trail the fields.

In the example above, city and firstname have extra whitespace, but it could be a number of fields, and the example is not complete.

I would need the pay
load to look like the above, but without those leading/trailing whitespace chars.

Trying to use mutate + strip doesn't work, because the quotes are part of the string, and need to remain that way, so it occurred to me that strip doesn't want to work there because there is no leading or trailing whitespace.

is there a way to hit defined fields and remove the whitespace around the string, but not within?

@Ugo_Sangiorgi - who had helped in slack

Maybe a quick and dirty solution would be to scan the whole string for \W" and "\W and replace them by quotes with gsub ?

input {

    java_generator {
        lines => ['{"city": "My town ","comments": "all is well","date":"2023-01-26T00:00:00.000Z","email":"myuser@example.com","firstname":"User ","lastname":"Name"}'
        ]
        count => 1
    }
}

filter {
    mutate {
        gsub => ['message', ' "', '"']
        gsub => ['message', '" ', '"']
    }

    json {
        source => "message"
    }
}

output {
    #logshark
    elasticsearch {
        hosts => ["http://host.docker.internal:9200"]
    }
}

@Ugo_Sangiorgi That does not appear to work for us.

The output is not going to Elasticsearch, but another SQS upstream. It would have to maintain the same structure as output.

I'm still seeing the same spaces in my attempts

The destination should not matter, as the substitution takes place in the raw message, no parsing. Do you have an example where it is not working?

Hey @mark54g

I'm not sure why mutate->strip wouldn't work... Is the problem that you want to retain the JSON as a string, and not as fields in the document? If that's the case, then howsabout this:

input {
    generator {
        lines => ['{"city": "My town ","comments": "all is well","date":"2023-01-26T00:00:00.000Z","email":"myuser@example.com","firstname":"User ","lastname":"Name"}'
        ]
        count => 1
    }
}

filter {

    json {
        source => "message"
        target => "[@metadata][json]"
    }
    mutate {
        strip => [ "[@metadata][json][city]", "[@metadata][json][firstname]" ]
    }
    json_encode {
        source => "[@metadata]"
        target => "message"
    }
}

output {
    stdout {
       codec => rubydebug { metadata => true }
    }
}

Cheers,
-Robin-

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.