Create arrays based on certain text in a field

I am parsing chat messages and I want to create three arrays for tag cloud purposes.

One for normal chat words, one for user tags, and one for chat emotes.

For example, take this chat message:
WE LOVE YOU @CHANDLER, YOUR GREAT AT EVERYTHING!! <3 <3 VirtualHug VirtualHug :) :WE LOVE YOU CHANDLER, YOUR GREAT AT EVERYTHING!! [emote=<3] <3 [emote=VirtualHug] VirtualHug [emote=:)] :)

Based on this message, I'd like to peel out the following :

  • user tag(s) : @CHANDLER
  • emotes : [emote=<3], [emote=VirtualHug], [emote=:)]

I have tried various examples I have found on these forums but nothing seems to work and always brings all of Logstash down.

I explained how to do that in a reply to another of your posts. Did you have an issue with that solution? If so, what is the issue?

Can you better explain how to use this code?
ruby { code => 'event.set("matches", event.get("message").scan(/@\w+/))' }

I am reading a field called : channel_message. I am wanting to create fields : tag_cloud_emote, tag_cloud_chat, and tag_cloud_tags.

If you have two things you want to scan for I would slightly change that code:

    ruby {
        code => '
            msg = event.get("channel_message")
            if msg
                event.set("tag_cloud_tags", msg.scan(/@\w+/))
                event.set("tag_cloud_emote", msg.scan(/\[emote=[^]]*\]/))
            end
        '
    }

which will get you

 "tag_cloud_tags" => [
    [0] "@CHANDLER"
],
"tag_cloud_emote" => [
    [0] "[emote=<3]",
    [1] "[emote=VirtualHug]",
    [2] "[emote=:)]"
],

Just insert that ruby filter into the filter section of your configuration file.

Thank you sir.

How do I account for just the chat words (that are not tags or emotes) ?

You could try something like

mutate { add_field => { "tag_cloud_chat" => "%{channel_message}" } }
mutate {
    gsub => [
        "tag_cloud_chat", "\[emote=[^]]*\]", "",
        "tag_cloud_chat", "@\w+", "",
        "tag_cloud_chat", "  ", " "
    ]
}

Removing the extra spaces with a third gsub is just easier than trying to add spaces to the other gsubs and handling corner cases where those spaces do not exist.

Badger this appears to be working great.

How can I, while in the Ruby filter, remove the [emote=(capture_group)] from around the capture group?

Also, here is the current set of code I am using in Logstash. I had some field changes per other app requirements.

##########
# PARSE TAG CLOUD [CHAT]
##########
if [tag][cloud][chat] =~ /^.+$/ {
  mutate { gsub => [ "[tag][cloud][chat]", "(\[emote=\S+\])", "" ] }
  mutate { gsub => [ "[tag][cloud][chat]", "(@\S+)", "" ] }
  mutate { split => { "[tag][cloud][chat]" => " " } }
}

##########
# PARSE TAG CLOUD [EMOTE]
##########
if [channel][msg][text] =~ /(?i)\[emote=\S+\]/ {
  ruby {
    code => '
      msg = event.get("[channel][msg][text]")
      if msg
        event.set("[tag][cloud][emote]", msg.scan(/\[emote=[^]]*\]/))
      end
    '
  }
  if [tag][cloud][emote] =~ /(?i)\[emote=\S+\]/ {
    mutate { gsub => [ "[tag][cloud][emote]", "\[emote=(\S+)\]", "\1" ] }
  }
}

##########
# PARSE TAG CLOUD [TAG]
##########
if [channel][msg][text] =~ /(?i)@\w+/ {
  ruby {
    code => '
      msg = event.get("[channel][msg][text]")
      if msg
        event.set("[tag][cloud][tag]", msg.scan(/@\w+/))
      end
    '
  }
}

What are you thoughts of this code?

Change the second scan to be

event.set("tag_cloud_emote", msg.scan(/\[emote=([^]]*)\]/).flatten)

Thank you sir.

Just noticed something a little odd with the output in a Kibana visualization.

Tag             Count 
@TimTheTatman   255
@Asmongold      214
@timthetatman   105
@DrLupo         84
@NICKMERCS      75
@nickmercs      52
@drlupo         50

I guess I need to have the ruby filter force all matches to lowercase? If so, how?

You can use mutate+lowercase to do it. If a field is an array it will iterate over the members.

That did the trick. Thank you again sir!