I've been trying to get elasticsearch to store indexes in an encrypted
format. I know people have opted it here, but using full disk encryption is
not a solution for me. The only place where the information that is being
indexed is allowed to be available in plain form is in memory. This allows
elasticsearch/Lucene to do its thing while the actual information cannot be
read from disk without the proper key material. As far as I can find,
there's no such implementation for elasticsearch as of yet.
What I've done so far, is create I/O wrappers for Lucene 4.3 that accept
read/seek/write requests like regular IndexInput and IndexOutput instances,
but use CipherInputStream and CipherOutputStream to delegate the byte
operations to a 'real' IndexInput or IndexOutput. This seems to work fine.
I've found that elasticsearch does a comparable trick with wrapping index
I/O instances, so I figured my code should be relatively easy to apply to
the elasticsearch code base.
Alas, this turns out to be untrue, as finding where to insert my wrappers
on top of / below elasticsearch's is proving problematic. Wrapping the
entire directory instance in StoreFileMetaData with one that wraps all I/O
requests with one of my wrappers is problematic as the tests never seem to
create an empty index, wrapping the I/O classes in Store.StoreDirectory
doesn't seem to work either as it keeps creating I/O classes outside this
point, causing it to read encrypted data without decrypting it.
So I'm looking for the best place to hook this into elasticsearch. I've
verified that my encryption/decryption wrappers work with Lucene, so I'd
like to build a POC to make this work within elasticsearch. Any great ideas
from the community?
Hi Mattijs,
Did you ever come up with a solution for encrypting ES indexes? - It
seems that similar questions have been asked a few times, but I haven't
found anyone who has developed a solution?
Thanks!
-Mike
On Thursday, May 30, 2013 8:39:18 AM UTC-4, Mattijs Ugen wrote:
Friends,
I've been trying to get elasticsearch to store indexes in an encrypted
format. I know people have opted it here, but using full disk encryption is
not a solution for me. The only place where the information that is being
indexed is allowed to be available in plain form is in memory. This allows
elasticsearch/Lucene to do its thing while the actual information cannot be
read from disk without the proper key material. As far as I can find,
there's no such implementation for elasticsearch as of yet.
What I've done so far, is create I/O wrappers for Lucene 4.3 that accept
read/seek/write requests like regular IndexInput and IndexOutput instances,
but use CipherInputStream and CipherOutputStream to delegate the byte
operations to a 'real' IndexInput or IndexOutput. This seems to work fine.
I've found that elasticsearch does a comparable trick with wrapping index
I/O instances, so I figured my code should be relatively easy to apply to
the elasticsearch code base.
Alas, this turns out to be untrue, as finding where to insert my wrappers
on top of / below elasticsearch's is proving problematic. Wrapping the
entire directory instance in StoreFileMetaData with one that wraps all I/O
requests with one of my wrappers is problematic as the tests never seem to
create an empty index, wrapping the I/O classes in Store.StoreDirectory
doesn't seem to work either as it keeps creating I/O classes outside this
point, causing it to read encrypted data without decrypting it.
So I'm looking for the best place to hook this into elasticsearch. I've
verified that my encryption/decryption wrappers work with Lucene, so I'd
like to build a POC to make this work within elasticsearch. Any great ideas
from the community?
Am Mittwoch, 6. November 2013 04:37:52 UTC+1 schrieb Mike Powers:
Hi Mattijs,
Did you ever come up with a solution for encrypting ES indexes? - It
seems that similar questions have been asked a few times, but I haven't
found anyone who has developed a solution?
Thanks!
-Mike
On Thursday, May 30, 2013 8:39:18 AM UTC-4, Mattijs Ugen wrote:
Friends,
I've been trying to get elasticsearch to store indexes in an encrypted
format. I know people have opted it here, but using full disk encryption is
not a solution for me. The only place where the information that is being
indexed is allowed to be available in plain form is in memory. This allows
elasticsearch/Lucene to do its thing while the actual information cannot be
read from disk without the proper key material. As far as I can find,
there's no such implementation for elasticsearch as of yet.
What I've done so far, is create I/O wrappers for Lucene 4.3 that accept
read/seek/write requests like regular IndexInput and IndexOutput instances,
but use CipherInputStream and CipherOutputStream to delegate the byte
operations to a 'real' IndexInput or IndexOutput. This seems to work fine.
I've found that elasticsearch does a comparable trick with wrapping index
I/O instances, so I figured my code should be relatively easy to apply to
the elasticsearch code base.
Alas, this turns out to be untrue, as finding where to insert my wrappers
on top of / below elasticsearch's is proving problematic. Wrapping the
entire directory instance in StoreFileMetaData with one that wraps all I/O
requests with one of my wrappers is problematic as the tests never seem to
create an empty index, wrapping the I/O classes in Store.StoreDirectory
doesn't seem to work either as it keeps creating I/O classes outside this
point, causing it to read encrypted data without decrypting it.
So I'm looking for the best place to hook this into elasticsearch. I've
verified that my encryption/decryption wrappers work with Lucene, so I'd
like to build a POC to make this work within elasticsearch. Any great ideas
from the community?
Please be aware that ES also transmits data unencrypted over the wire
between nodes. There is not much sense in encrypting data to disk, when the
data is readable when listening to the network interface. Also a secure
location of the keys is required which is a challenge in a distributed
environment.
The ES file I/O activity is scattered over many places. Lucene uses
IndexReader/IndexWriter, but there is no ES I/O layer or something for the
ES metadata add-ons. This would have to be refactored into the code.
After an ES I/O refactoring, an encrypted ES FileChannel approach with a
remote key store provider could lead in the right direction, like in Flume https://issues.apache.org/jira/browse/FLUME-1424
In my personal opinion, encrypting every piece of data within an ES cluster
is way too expensive, the result will be dog slow and ES won't deliver
satisfactory results under these kind of requirements. And there are many
other attack vectors if unauthorized access to ES nodes is possible. So,
the safest place is to lock ES servers in a private network, and configure
the servers so that nobody except authorized staff can access them.
Security is not just about encrypting data, but about preventing intrusion
into systems.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.