Gsub for numbers?


(Jason) #1

This is a take on a previous topic I posted.

I have values coming across the wire that are meant to go into integer/double/float... values in Elastic, but they are coming cross as string values.

Within logstash, I need to convert these values to null. I can do this with gsub for strings. Is there another way to do with for integers/doubles...?


(Magnus Bäck) #2

Not sure exactly what you mean. If your fields indeed are numbers, why would you need to turn them into null? And if they're strings, why doesn't the gsub option work? Perhaps you can give an example or two of input message and the expected output.


(Jason) #3

So the issue comes when the fields are numbers, but the data has become corrupted, which happens. If I try to forward the mistyped data to Elastic, I get the dreaded Error 400, which pukes into my log files (and can quickly fill up the drive on my logstash shipper).

I need logstash to normalize the data before it goes to Elastic. I am using templates in Elastic, so it is expecting specific data sites.

Say I have a field that is sent to logstash that is meant to be a number (i.e. {"my_int" => "32" }), but it ends up being bad data (i.e. {"my_int" => "[ ]"}). I can use grok/regex to find the data in the data stream, but there's nothing that lets me normalize data that may be bad (from a number standpoint).

I need something in logstash to ensure that I send {"my_int" => null} so that Elastic doesn't throw a fit.

Right now, I can use gsub for strings, but don't have anything for numbers.


(Magnus Bäck) #4

Again, an example would help.

Regexp conditionals should work fine:

if [message] !~ /^[0-9]+$/ {
  ...
}

Or, tighten your grok expression to only match numbers and add the fields with any value you like if there's missing.

I don't know what your gsub looks like so I don't understand why it doesn't work with numbers.


(Joshua Rich) #5

Why do need to store the field at all for the document where it is corrupted? Couldn't you just use a remove_field parameter and drop the field altogether?


(Jason) #6

So how do I do a conditional remove_field? I only want to remove the field if it meets a certain condition.


(Jason) #7

So, for an example, I have a mutate like the following

mutate {
     gsub => [ "my_int_value", "(-|\[\])", "null" ]
}

This is not working. I am still getting "-" values in the fields getting sent to Elastic, instead of null.

What happens if I do tighten the grok, and now the message coming in does not meet the exact match? Does it throw out the whole message? I just want to throw out single bad fields.


(Joshua Rich) #8

If you've extracted the field with a grok filter, you can just use a logstash conditional to remove it with a mutate filter:

if [field] !~ /^[0-9]+$/ {
  mutate {
    remove_field => [ "field" ]
  }
}

Just put something like that after the grok that extracts/creates that field.

For a grok that doesn't match, the message will be tagged with _grokparsefailure. You can process those messages later in the Logstash pipeline if you want.


(system) #9