Not sure exactly what you mean. If your fields indeed are numbers, why would you need to turn them into null? And if they're strings, why doesn't the gsub option work? Perhaps you can give an example or two of input message and the expected output.
So the issue comes when the fields are numbers, but the data has become corrupted, which happens. If I try to forward the mistyped data to Elastic, I get the dreaded Error 400, which pukes into my log files (and can quickly fill up the drive on my logstash shipper).
I need logstash to normalize the data before it goes to Elastic. I am using templates in Elastic, so it is expecting specific data sites.
Say I have a field that is sent to logstash that is meant to be a number (i.e. {"my_int" => "32" }), but it ends up being bad data (i.e. {"my_int" => "[ ]"}). I can use grok/regex to find the data in the data stream, but there's nothing that lets me normalize data that may be bad (from a number standpoint).
I need something in logstash to ensure that I send {"my_int" => null} so that Elastic doesn't throw a fit.
Right now, I can use gsub for strings, but don't have anything for numbers.
Why do need to store the field at all for the document where it is corrupted? Couldn't you just use a remove_field parameter and drop the field altogether?
This is not working. I am still getting "-" values in the fields getting sent to Elastic, instead of null.
What happens if I do tighten the grok, and now the message coming in does not meet the exact match? Does it throw out the whole message? I just want to throw out single bad fields.
Just put something like that after the grok that extracts/creates that field.
For a grok that doesn't match, the message will be tagged with _grokparsefailure. You can process those messages later in the Logstash pipeline if you want.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.