How should I encrypt data at rest with Elasticsearch?

Note: while we recommend these options, we cannot provide support for any particular tool or help debugging issues with dm-crypt itself.

For the purpose of protecting Data at Rest through encryption, from the Elasticsearch point of view, we believe the relying on the underlying Operating System to handle this function is best. The recommended encryption method is dm-crypt. dm-crypt is available on most major linux distributions -- it just needs to be enabled and configured. We cannot recommend a similar solution on Windows, but there are certainly options available.

dm-crypt is recommended for a variety of reasons:

  • Widely used across the industry, showing that it has a lot of support and a robust feature set
  • Same algorithms used by FileVault on Mac systems, hardware based implementations, and other commercial products. But it's open source and free
  • Supports many different standardized encryption formats, algorithms and features
  • Can be used to encrypt entire disks, or smaller volumes which are mounted individually. This allows one to encrypt individual indices with different keys for example (although it adds logistical complexity)
  • Comes with a number of tools to manage keys, user roles and authentication, revoke access etc
  • Provides the same semantics as a file system since it operates on the block layer (e.g. under the FS), this means that guarantees like transactional journaling provided by ext, zfs, etc are kept. This is in contrast to FUSE options, which often re-implement "filesystem-like operations". Since these live above the FS, they can be buggy and introduce corruption into one's data which are not protected by the FS guarantees.

Some alternatives to dm-crypt:

  • EncryptFS - simpler, less robust feature set
  • Other FUSE (filesystem in user-land) options. Many different options, with varying levels of support for algos and user-management. But these will almost always be slower, since they operate above the OS
  • Commercial options - with pros and cons. On the pro side, you are paying someone to support your deployment, so if something goes wrong, you have someone to call. With OSS solutions like dm-crypt, there is a lot written about it but you are less likely to get immediate help. On the con-side, commercial solutions may be less widely used and may be missing features, not fully audited etc. If something does go wrong, you may be "stuck" with that vendor.

General considerations:

  • It's best to use modern Intel CPUs (e.g. within the last ~3 years) since newer CPUs include hardware instructions specifically for encryption
  • If swap is enabled, might need mount the swap partition as an encrypted volume too, since swapped memory is temporarily "Data at Rest"
  • Note: We recommend swap is disabled on ES machines anyway, since this is bad for performance in general
  • Each tool has its own set of guidelines for various "logistical" operations, like performing backups. Need to consult each tool to determine how it is done, since incorrect operation could leave your data "locked" irrecoverably.
6 Likes