Isolating Hashtags in Twitter Feed

Maybe my best bet is to start with my end goal and then someone can tell me if this is possible. I'd like to build Kibana visuals that show top mentions of hashtags as they appear in the [extended_tweet][full_text] field. It seem like the first step is to pull the 'hashtagged' words out of this field and stick them as an array into their own field. Is it then possible for visualizations to search and aggregate separate values in a field?

How would I isolate data in a text field and move it into another?
Can Kibana visualizations aggregate and visualize values that are part of an array in a field?

We do this via a ruby filter;

if [message] {
  ruby {
    code => "event.set('hashtags', event.get('message').scan(/\#[a-z]*/i))"
  }
}

It could probably be folded back into the plugin, but this came via someone smarter than me and I am not sure on how to submit it back as code :wink:

1 Like

If I turn my head sideways and squint really hard....I can almost...nope, I still can't tell wtf that code is, lol.

Huzzah, it works! Thanks again, Mark.

  if [extended_tweet][full_text] {
    ruby {
      code => "event.set('hashtags', event.get('[extended_tweet][full_text]').scan(/\#[a-z]*/i))"
    }
  } else {
    ruby {
      code => "event.set('hashtags', event.get('text').scan(/\#[a-z]*/i))"
    }
  }
1 Like

OH OH! Bonus question!!

Say I want to do the same thing...but with hyperlinks...would this be how you configure the "scan" parameters?

"event.set('links', event.get('[extended_tweet][full_text]').scan(/\http[a-z]*/i))"

OOOO.....and is it possible to instruct Kibana to make them clickable hyperlinks?

It looks ok, but I am not that good at regex or ruby.

It is yeah;

Looks to be a little more complicated...so far I've come up with:
code => "event.set('links', event.get('text').scan(/(https:\/\/|http:\/\/)[a-z].*/i)) but all that's being captured in the links field is http:// or https://. I also don't think this will work completely as it is looking to include everything after the http://, I need a delimiter to separate multiple links.

hmm....found an online regex builder and the builder seemed to like this but it's still only putting http:// or https:// in the field... code => "event.set('links', event.get('[extended_tweet][full_text]').scan(/(https:\/\/|http:\/\/)\S*/i))"

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.