One way to do this is to transform reference_2.csv into something that a translate filter can use. Suppose we start off with
col1,col2,"Other","Stuff"
3,255,Lorem ipsum dolor sit amet,consectetur adipiscing elit
2,256,sed do eiusmod tempor incididunt,ut labore et dolore magna aliqua
and we run it through a configuration like this
input { stdin {} }
filter {
csv { autodetect_column_names => true target => "object" }
mutate { rename => { "[object][col2]" => "[key]" } }
}
output { stdout { codec => line { format => '"%{key}": %{object}' } } }
using
/usr/share/logstash/bin/logstash -f /path/to/file.conf --path.settings /etc/logstash < lookup.csv > dictionary.yml
that gets you a file that looks like this
"255": {"Stuff":"consectetur adipiscing elit","Other":"Lorem ipsum dolor sit amet","col1":"3"}
"256": {"Stuff":"ut labore et dolore magna aliqua","Other":"sed do eiusmod tempor incididunt","col1":"2"}
Note that we are not using numeric keys, we are converting them to strings.
If you then configure a translate filter to use that then it looks up the string "255" (since add_field always adds strings, not integers) and parses the JSON for you
mutate { add_field => { "key" => 255 } }
translate { dictionary_path => "/home/user/dictionary.yml" field => "key" destination => "[@metadata][dict]" }
mutate { add_field => { "stuff" => "%{[@metadata][dict][Stuff]}" } }
results in
"@metadata" => {
"dict" => {
"Other" => "Lorem ipsum dolor sit amet",
"col1" => "3",
"Stuff" => "consectetur adipiscing elit"
}
},
"key" => "255",
"stuff" => "consectetur adipiscing elit"
There is another way to do this, which I am still thinking about, but this would work.