I think we have two options:
- modify or replace the avro codec; OR
- chain together filters (and write one of our own) to do the work of a codec
codec option:
The existing avro codec:
- receives a single base64-encoded string
- decodes that string into raw bytes
- uses the avro gem to convert those bytes to a ruby Hash
- creates the Logstash
Event
From what I can tell, we're looking to inject a new step between #1 and #2 that decrypts cipherbytes into plainbytes. That way we can do all of the decoding in the codec and emit events that are fully-contextualised from the get-go.
This will require modifications to the avro codec, but we could easily support a pipeline config that looked like this:
input {
kafka {
codec => avro {
decrypt => {
algorithm => "AES-256"
key => "$KEY_FROM_SECRETSTORE"
}
}
}
}
filters option:
If we were to use the plain codec, and emit events from logstash-input-kafka that merely contain a message with our cipherbytes, we would need to add a filter that converted our cipherbytes to plainbytes (using logstash-filter-cipher), and another filter to convert those plainbytes through Avro into the attributes of our Event; there presently is no logstash-filter-avro-decode, but it would be trivial to make one that wraps the avro gem.
Our config would look something like this:
input {
kafka {
codec => plain {
charset => "BINARY"
}
# connection params
}
}
filter {
cipher {
mode => "decrypt"
algorithm => "AES-256"
key => "$KEY_FROM_SECRETSTORE"
}
avro {
source => "message"
}
}
That said, it looks like logstash-filter-cipher has some outstanding issues and may need some work before it can be a viable option (e.g., the key attribute is not safeguarded and can show up in debug output, which is less than ideal from a security standpoint).
Additionally, it assumes that the iv used to encrypt the plainbytes will be the first externally-agreed-upon-number of bytes of the input, which may or may not be case for your cipherbytes.