Flatting Libpostal label / value response


(swarmee.net) #1

Ok so I'm using a libpostal rest interface to parse out some addresses that our systems have captured --> here is the docker image.

Basically I submit in a request like this

curl -X POST -d '{"query": "100b sydney st buffalo ny"}' localhost:8080/parser | python -mjson.tool

And i get a response like this - the number of label/value pairs depends on the format and length of the address (sometimes there is multiple of the same label for compound addresses).

[
    {
        "label": "house_number",
        "value": "100b"
    },
    {
        "label": "road",
        "value": "sydney st"
    },
    {
        "label": "city",
        "value": "buffalo"
    },
    {
        "label": "state",
        "value": "ny"
    }
]

Using the http filter plugin in logstash I can submit millions of requests to this API. However the response is not that useful cause it is a nested object of labels and values. Below is the http filter plugin usage.

rest {
request => {
url => "http://xxxxx.net:8080/parser"
method => "post"
params => {
"query" => "%{address}"
}
}
json => true
target => "addressParsed"
}

I would like to process the response in logstash to look more like this -->

[
    {
        "house_number" : "100b",
        "road" : "sydney st", 
         "city" : "buffalo",
         "state":"ny"
    }
]

I think that this is what the kv filter is meant to help me out with (however not really sure).
I have been experimenting around with the a logstash config that looks something like this without any luck -->

kv {
allow_duplicate_values => true
source => "%{addressParsed}"
target => "addressKV"
recursive => "true"
field_split => ","
value_split => ":"
include_brackets => false
}

Can somebody confirm that I am heading the in right direction ?


(system) #2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.