Ok so I'm using a libpostal rest interface to parse out some addresses that our systems have captured --> here is the docker image.
Basically I submit in a request like this
curl -X POST -d '{"query": "100b sydney st buffalo ny"}' localhost:8080/parser | python -mjson.tool
And i get a response like this - the number of label/value pairs depends on the format and length of the address (sometimes there is multiple of the same label for compound addresses).
[ { "label": "house_number", "value": "100b" }, { "label": "road", "value": "sydney st" }, { "label": "city", "value": "buffalo" }, { "label": "state", "value": "ny" } ]
Using the http filter plugin in logstash I can submit millions of requests to this API. However the response is not that useful cause it is a nested object of labels and values. Below is the http filter plugin usage.
rest {
request => {
url => "http://xxxxx.net:8080/parser"
method => "post"
params => {
"query" => "%{address}"
}
}
json => true
target => "addressParsed"
}
I would like to process the response in logstash to look more like this -->
[ { "house_number" : "100b", "road" : "sydney st", "city" : "buffalo", "state":"ny" } ]
I think that this is what the kv filter is meant to help me out with (however not really sure).
I have been experimenting around with the a logstash config that looks something like this without any luck -->
kv {
allow_duplicate_values => true
source => "%{addressParsed}"
target => "addressKV"
recursive => "true"
field_split => ","
value_split => ":"
include_brackets => false
}
Can somebody confirm that I am heading the in right direction ?