Im reading values from a kafka topic and performing a few clean up tasks, the data from Kafka has all fields as strings (surrounded by double quotes), even including integers and floats. and i'm trying to convert them to their correct data type before placing into an Elasticsearch index.
Originally i tied to accomplish this using mutate/convert but realize thats not a good idea since the underlying data will still be encapsulated by double quotes i.e "10000" cannot be an integer
now im trying to do this via mutate/gsub with regex lookbacks, but its still not working. heres an example:
gsub => ["qty","\x22[\x22]", "\1"]
I used [\x22] (the ASCII hex code for double quotes) in place of double quotes because statement already requires the fields to be surrounded in double quotes and i was not sure how it would react even with them being escaped
Heres my full config file:
input {
kafka {
bootstrap_servers => "localhost:9092"
topics => ["warehouse_deals"]
}
}
filter {
mutate {
gsub => ["message", ".*[^\x00-\x7F]\s+", ""]
}
json {
source => "message"
target => "warehouse_deals"
}
mutate {
gsub => ["qty","\x22[\x22]", "\1"]
gsub => ["payload.qty","\x22[\x22]", "\1"]
gsub => ["warehouse_deals.payload.qty","\x22[\x22]", "\1"]
}
}
output {
elasticsearch {
hosts => ["http://localhost:9200"]
index => "kafkatest"
#user => "elastic"
#password => "changeme"
}
stdout { codec => rubydebug }
}
the reason i specify "qty" 3 different ways is because im unsure if this config file is read sequentially, because the messages are coming from Kafka all of the Json is in the message field, only after being parsed out is "qty" exposed, the JSON has it in a "payload" array hence the value in the middle and i target the "warehouse_deals" index hence the 3rd value
i can provide sample data upon request, and the source JSON values cannot be changed to just not have double quotes around number, i dont control the source values