Convert Inline Ruby to Ruby Script

I was given some inline ruby code to do some work for me. I've put it to good use but it makes my pipeline very messy and had a couple questions. First, here's what this part of the pipe looks like right now:

  if [extended_tweet][full_text] {
    ruby {
      code => "event.set('hashtags', event.get('[extended_tweet][full_text]').scan(/\#[a-z]*/i))"
    }
    ruby {
      code => "event.set('links', event.get('[extended_tweet][full_text]').scan(/https?:\/\/\S*/))"
    }
    ruby {
      code => "event.set('mentioned', event.get('[extended_tweet][full_text]').scan(/\@\S*/i))"
    }
    mutate {
      add_field => { "extended_tweet_original" => "%{[extended_tweet][full_text]}" }
    }
    mutate {
      gsub => [
        "[extended_tweet][full_text]", "\@\S*|\#\S*,?|https?:\/\/\S*,?", "",
        "[extended_tweet][full_text]", "&amp", "and",
        "text", "\@\S*|\#\S*,?|https?:\/\/\S*,?", "",
        "mentioned", "@", ""
      ]
      lowercase => ["hashtags"]
    }
  } else {
    ruby {
      code => "event.set('hashtags', event.get('text').scan(/\#[a-z]*/i))"
    }
    ruby {
      code => "event.set('links', event.get('text').scan(/https?:\/\/\S*/))"
    }
    ruby {
      code => "event.set('mentioned', event.get('text').scan(/\@\S*/))"
    }
    mutate {
      gsub => [
        "text", "\@\S*|\#\S*,?|https?:\/\/\S*,?", "",
        "text", "&amp", "and",
        "mentioned", "@", ""
      ]
      lowercase => ["hashtags"]
    }
  }
  • Would it be more efficient to drop the code into two separate script files and have them called when appropriate vs the way I am currently doing it?
  • How do I convert the inline into ruby? Anyone willing to convert one of these lines and then I can work my way backwards for the rest?
  • If not into a script file, is there a way to condense all three inline scripts to run in a single ruby filter?

Just addressing this part, you can just combine them. code can be multiple lines.

ruby {
      code => "event.set('hashtags', event.get('[extended_tweet][full_text]').scan(/\#[a-z]*/i))
               event.set('links', event.get('[extended_tweet][full_text]').scan(/https?:\/\/\S*/))
               event.set('mentioned', event.get('[extended_tweet][full_text]').scan(/\@\S*/i))"
    }

Tried this, it didn't throw any immediate errors but Logstash didn't quite like it. Watching the log file, I would occassionaly see:

[ERROR][logstash.filters.ruby ] Ruby exception occurred: undefined method `scan' for nil:NilClass

then all three fields for that event would be blank.

Are you absolutely certain that the events for which you get exceptions contain [extended_tweet][full_text]? Can you try something like

code => "
  ft = event.get('[extended_tweet][full_text]')
  if ft
    event.set('hashtags', ft.scan(/\#[a-z]*/i))
    event.set('links', ft.scan(/https?:\/\/\S*/))
    event.set('mentioned', ft.scan(/\@\S*/i))
  else
    event.set('missing_full_text', true)
  end"

Well no, it's not necessarily the [extended_tweet][full_text] field that thecode is running on. The IF statement looks for that field and if present runs the three lines of ruby. If the field is not present, it runs the three lines of ruby code on the [text] field. So the error could be occurring on either one.

Sorry, I didn't read your code correctly. Is it possible that there are events that have neither [text] nor [extended_tweet][full_text]?

All events will have text, some events will have extended_tweet.full_text. When extended_tweet.full_text is present, I need to perform actions on it and text. When it is not, then only operations on text are performed. Regardless of field presence, values for each ruby code may or may not be present.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.