I have a simple custom grok pattern to "normalise" version numbers from an input to capture just major and minor release version information from a full version field. Some examples to give you an idea:
1.2(3)U2 => 1.2
1.2.3.12056 => 1.2
7.1 => 7.1
7 => 7.0
I have a test filter which I run from the commandline which takes stdin, does the filter and outputs to stdout. This works as expected. The full logstash config is here:
input {
stdin { }
}
filter {
grok {
match => { "message" => [
"(?<version_normalised>^\d+$)",
"(?<version_normalised>^\d+\.\d+)" ]
}
}
if [version_normalised] =~ /^\d+$/ {
mutate {
replace => { "version_normalised" => "%{version_normalised}.0" }
}
}
}
output {
stdout {
codec => rubydebug
}
}
Example run:
1.2(3)U2
{
"@timestamp" => 2017-09-14T05:41:26.562Z,
"version_normalised" => "1.2",
"@version" => "1",
"host" => "ccbu-reporting",
"message" => "1.2(3)U2"
}
7.1
{
"@timestamp" => 2017-09-14T05:41:36.944Z,
"version_normalised" => "7.1",
"@version" => "1",
"host" => "ccbu-reporting",
"message" => "7.1"
}
7
{
"@timestamp" => 2017-09-14T05:41:38.412Z,
"version_normalised" => "7.0",
"@version" => "1",
"host" => "ccbu-reporting",
"message" => "7"
}
Then I use this working filter in my main logstash config. The main difference being is the input is taken from jdbc plugin (via a database query) and the output is piped to elasticsearch plugin. The filter
input {
jdbc {
# JDBC connection details omitted
# Each row of results will include a column with name "software_versions"
}
}
filter {
grok {
match => { "software_versions" => [
"(?<version_normalised>^\d+$)",
"(?<version_normalised>^\d+\.\d+)" ]
}
}
if [version_normalised] =~ /^\d+$/ {
mutate {
update => { "version_normalised" => "%{version_normalised}.0" }
}
}
}
output {
# ElasticSearch output plugin
elasticsearch {
hosts => ["http://x.x.x.x:9200"] # IP address ommitted
id => "logstash-evt"
index => "logstash-evt-%{+xxxx.ww}"
document_type => "evt"
document_id => "%{evt_id}"
}
}
However within elasticsearch the "version_normalised" has the correct value but it is duplicated in an array. For example from Kibana looking at the raw JSON snippet of the event for relevant fields below:
"software_versions": "1.2(3)",
"version_normalised": [
"1.2",
"1.2"
],
I am not sure why this is happening because the commandline test works as expected. Any suggestions would be appreciated, thank you.