Logstash Ruby Parse CSV and Validate Its Fields


(Robert Slama) #1

Hello,
I am trying to monitor a directory, parse csv files, and validate some of the fields' values. I thought I could do the following but failed. Any suggestion?
I am open for alternatives...

input {
file {
path => "C:/test/*.csv"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
csv {
separator => ","
#my fields
columns => ["field1","field2","field3","field4"]
}
ruby{
code => “
#my array for validation
ary = [“value1”,”value2”,”value3”]
if ary.include? (event["field2"])
#create new field, assign invalid as value
event["field5"] = "Invalid"
else
#create new field, assign valid as value
event["field5"] = "Valid"
end

}

}
output {
elasticsearch {
hosts => "http://localhost:9200"
index => "test_index"
}
stdout {}
}


(Magnus Bäck) #2

For best results when asking a question, always quote any error messages or give other details that points at the problem.

Your problem is most likely that you're nesting double quotes inside double quotes in your ruby filter.

Do:

code => " ... ' ... "
code => ' ... " ... '

Do not:

code => " ... " ... "
code => ' ... ' ... '

(Robert Slama) #3

Hi Magnus,

Sorry I wasn't clear. I am trying to validate a field value during parsing/posting to elastic/kibana.
I am not sure what is the best way to achieve that, my end goal is to validate all the data I am posting, some fields has a list of valid values, some are dates only, others are integer, etc...
My first attempt was the field with a list of valid values (shown below success) -> it would be better if I know how to define an array to validate against it.

Next, I need to check field value formatting, example: [field1] should be formatted => '0123-45A-678'

below is what I found easier than using Ruby:

if ([field_to_validate] in ['value1', 'value2', 'value3']){
mutate{add_field => { "validation" => "invalid" }}
}
else{
mutate{add_field => { "validation" => "valid" }}
}


(Guy Boertje) #4

Have a look at the translate filter. It offers a lookup solution. You may need to use two translate filters with an if block to exclude invalid values from a regex check.

set the exact config option to true for both.

Value Validation
It allows you to define a list of invalid -> flag pairs - match against the first and add the second as a value to the @target field.
e.g.

"foo", "invalid",
"bar", "invalid"

Format Validation
set the regex config option to true.
It allows you to define a list of regex -> flag pairs - match the regex and add the flag to the @target.
e.g.

 "^\d{4}-\d{2}[A-z]-\d{3}$", "format valid"

The dictionaries can be loaded from a file with periodic refresh if you want to change them.

Use this site to build regexes: http://rubular.com/


(Magnus Bäck) #5

Sorry I wasn't clear. I am trying to validate a field value during parsing/posting to elastic/kibana.

No, that certainly wasn't clear but the configuration you posted nevertheless has a problem with nested quoting.


(Robert Slama) #6

You are correct! This was just an example of what I am trying to achieve. I was able to fix quoting error.
As I mentioned, I am new at this, your help is highly appreciated...

Now I am trying to figure out how to enable/plugin my regex to validate field's formats and values...


(Robert Slama) #7

Thank you!
I think this is very close to what I am looking for. I just need to read a little more on how to plugin the regex.
I cannot find a sample code where regex - exact = true, and whether I need to add ruby filter for that.


(system) #8

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.