Reading encrypted Avro records, do I need a new codec?

I think we have two options:

  • modify or replace the avro codec; OR
  • chain together filters (and write one of our own) to do the work of a codec

codec option:

The existing avro codec:

  1. receives a single base64-encoded string
  2. decodes that string into raw bytes
  3. uses the avro gem to convert those bytes to a ruby Hash
  4. creates the Logstash Event

From what I can tell, we're looking to inject a new step between #1 and #2 that decrypts cipherbytes into plainbytes. That way we can do all of the decoding in the codec and emit events that are fully-contextualised from the get-go.

This will require modifications to the avro codec, but we could easily support a pipeline config that looked like this:

input {
  kafka {
    codec => avro {
      decrypt => {
        algorithm => "AES-256"
        key => "$KEY_FROM_SECRETSTORE"
      }
    }
  }
}

filters option:

If we were to use the plain codec, and emit events from logstash-input-kafka that merely contain a message with our cipherbytes, we would need to add a filter that converted our cipherbytes to plainbytes (using logstash-filter-cipher), and another filter to convert those plainbytes through Avro into the attributes of our Event; there presently is no logstash-filter-avro-decode, but it would be trivial to make one that wraps the avro gem.

Our config would look something like this:

input {
  kafka {
    codec => plain {
      charset => "BINARY"
    }
    # connection params
  }
}
filter {
  cipher {
    mode => "decrypt"
    algorithm => "AES-256"
    key => "$KEY_FROM_SECRETSTORE"
  }
  avro {
    source => "message"
  }
}

That said, it looks like logstash-filter-cipher has some outstanding issues and may need some work before it can be a viable option (e.g., the key attribute is not safeguarded and can show up in debug output, which is less than ideal from a security standpoint).

Additionally, it assumes that the iv used to encrypt the plainbytes will be the first externally-agreed-upon-number of bytes of the input, which may or may not be case for your cipherbytes.

1 Like