Gsub remove words less than 4 characters in length

I am attempting to use a gsub mutate to remove words that are less than 4 characters in length.

mutate { gsub => [ "[tag][cloud][chat]", "(\S+){1,3}", "" ] }

But it is removing everything. Any suggestions?

I am also attempting to remove URLs but that doesn't seem to be working either.

mutate { gsub => [ "[tag][cloud][chat]", "(http:\S+)", "" ] }
mutate { gsub => [ "[tag][cloud][chat]", "(https:\S+)", "" ] }

But these two gsubs ARE working.

mutate { gsub => [ "[tag][cloud][chat]", "(\[emote=\S+\])", "" ] }
mutate { gsub => [ "[tag][cloud][chat]", "(@\S+)", "" ] }

If you have a word of seven characters in that field then I would expect that pattern to remove 3, then to remove 3 again, and then to remove 1. Which would remove the entire word. You need to anchor the patterns.

Perhaps "(^|\b)\S{1,3}(\b|$)" ?...

That worked for the words less than 4 characters.

Thank you!!!

Do you have a suggestion for removing any URLs?? Nothing seems to work for me, at this point. I was trying to cover http and https but I am going to need to make sure I cover ftp as well.

I suggest you look at the grok patterns for URLs and maybe extract parts of those. A complete URL would be

[A-Za-z]([A-Za-z0-9+\-.]+)+://(?:[a-zA-Z0-9._-]+(?::[^@]*)?@)?(?:(?:(?:(?:((([0-9A-Fa-f]{1,4}:){7}([0-9A-Fa-f]{1,4}|:))|(([0-9A-Fa-f]{1,4}:){6}(:[0-9A-Fa-f]{1,4}|((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){5}(((:[0-9A-Fa-f]{1,4}){1,2})|:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){4}(((:[0-9A-Fa-f]{1,4}){1,3})|((:[0-9A-Fa-f]{1,4})?:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){3}(((:[0-9A-Fa-f]{1,4}){1,4})|((:[0-9A-Fa-f]{1,4}){0,2}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){2}(((:[0-9A-Fa-f]{1,4}){1,5})|((:[0-9A-Fa-f]{1,4}){0,3}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){1}(((:[0-9A-Fa-f]{1,4}){1,6})|((:[0-9A-Fa-f]{1,4}){0,4}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(:(((:[0-9A-Fa-f]{1,4}){1,7})|((:[0-9A-Fa-f]{1,4}){0,5}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:)))(%.+)?|(?<![0-9])(?:(?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5]))(?![0-9]))|\b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(\.?|\b)))(?::\b(?:[1-9][0-9]*)\b)?)?(?:(?:/[A-Za-z0-9$.+!*'(){},~:;=@#%&_\-]*)+(?:\?[A-Za-z0-9$.+!*'|(){},~@#%&/=:;_?\-\[\]<>]*)?)?

:smiley:

I actually have it working now :slight_smile:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.