Store only index without a way to retrieve the actual indexed data

(Yaniv Hakim) #1

Hi everyone,

We're thinking to add elastic to our product stack and we want to use it as a full text search solution and aggregations.

The problem is our data is very sensitive and our main data store is AWS MySQL RDS with encryption at rest.
We need to add to the elasticsearch only the non-sensitive columns including the main text column which is very sensitive, so I just want to index this column without storing the REAL data there. Only to have a pointer (id) to the real record in MySQL RDS.

I saw there are some discussions on using the _source = false feature. But, does it ensure that the original column data will not be saved in elasticsearch and will be never retrievable?

Any other solution?

Thanks,
Yaniv

1 Like
(Zachary Tong) #2

Disabling source will prevent the original JSON from being retrievable. But the individual tokens that make up the inverted index will still be "in the clear", and theoretically you could re-build a sort of representation of the original field based on the tokens and positions (and assuming you have access to the inverted index file and know how to parse it).

So I'm not sure how stringent the requirement is, the answer is probably "maybe". The inverted index blends all the tokens from all the documents in an index structure, which points from token to document (which is how search works). It's not cleartext JSON which is immediately usable, but it isn't encrypted and entirely obfuscated either.

We generally recommend full disk encryption, FUSE solutions, etc when encryption at rest is needed.