I need to store and search documents containing highly confidential fields.
I need to make some searches based on these fields.
I do not need to get the values of these fields when I send a query to Elastic Search.
I need privacy on these fields : values cannot be accessed via any console. Data cannot be stored in plain text.
Questions :
1- Do Elasticsearch hash fields meet my requirements?
2- The fact these fields are hashed in Elastic Search does not alter my Elastic search queries. It's fully transparent for my developers? Right?
As far as I know Elasticsearch does not have anything that meets those requirements. You can however control access to fields using field level security (commercial feature) but if you are not allowed to see a field I do not think you are allowed to serach in it either as you could then deduce the field of a returned document contained a specific value.
There have been attempts at encrypting data at the field level in Elasticsearch, but as far as I am aware that does limit the features available and is not transparent.
Thanks for your answer Christian. I am referring to the same feature as an LDAP directory when it stores a hash of a password rather than the password itself.
I am referring to this page Hash Fields | Elastic Common Schema (ECS) Reference [8.11] | Elastic and I wonder if Elastic Search is able to hash some attributes of a document and stores them hashed. Then, during the query it could compare that hash with a hash built during the query process.
The page you shared is about the ECS fields hash, how they need to be mapped so the fields can be correlated with other indices.
You need to hash the value before indexing it, elasticsearch itself will not do that, but depending on how you are indexing the data you can use the fingerprint processor on an ingest pipeline, but you will also need to make sure to remove the original field during this process
Well, the hashed value will be a keyword, so you can search and compare it with another keyword, but you cannot search for a plain text value and return the hashed value in elasticsearch, or vice versa.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.