I have a few questions around what encryption options are available in Elasticsearch:
• Does Elasticsearch offer encryption of data at rest (in other words, of all the data it is storing)?
• Does it offer encryption at the index level?
• Does it offer encryption at the field (or column-level, if we were talking in relational database parlance)?
We support running on top of infrastructure level encryption via dm-crypt (for platinum level customers), and there are options for encrypting or hashing incoming data via Logstash beforing indexing it into Elasticsearch.
With regards to field and document level encryption, how is that decrypted to be able to search on it? I'd assume that for it to work, the decrypted data is indexed, it's encrypted when stored, and somehow searched and decrypted when returning results? Is that decrypted on-the-fly? Any documentation where I can read-up on how field and document level encryption are done?
There is no field or document level encryption within Elasticsearch, so if you encrypt or hash data prior to indexing, e.g. using Logstash, you will need to handle that translation at the application level as the encrypted/hashed values is what will be indexed and searchable.
"pointless" depends on your purpose for encryption, and the problem you are trying to solve. Your high level options are:
Store encrypted data in ES. It is not searchable, nor can it be used in aggregations, but any clients that have the correct keys can decrypt the data and make sense of it. This implies that you are using ES simply as a storage system for that field, not a search engine.
Store hashed data using a stable keyed hash. If you configure your hashing process so that it produces the same values for identical input, then you can aggregate & data match on identical values, but you cannot reverse the hashing, nor can you search for original input values (unless you have the key)
Store hashed data using an un-keyed hash. You can aggregate & data match on identical values. You cannot reverse the hashing, but you can you search for original input values by passing them through the same hash and searching for the hashed-velue. You can only perform full-match "keyword" style searches (no prefixes, etc) due to the hash.
Alternatively, you run ES on encrypted volumes (e.g dm-crypt) in which case everything is encrypted at the storage layer, but is in plain-text at the application layer.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.