Hey David, thanks for the continiued replies on this thread, I realy do appreciate them.
If I understood correctly, highlighting is actually disabled when
_source is disabled (per the documentation here), or is there an other type of highlighting besides "on the fly highlighting"?
Your reference to "phrase searches" opened up a new world to me. The world of "token graphs" to be exact. As far as I understand there is no (default) way to disable this (hence disabling phrase searches) and thus it stands to reason that a committed attacker could rebuild entire phrases, paragraphs or documents based on this information (minus stop words, and other tokenizer filters). Do I understand that correctly, because that is a very important point?
It's clear to me that this increases the overall attack surface of our system. However, customers should be properly informed about the tradeoffs of the features they desire. It is our responsibility to explain this as clearly as possible to the customer so they may make an informed decisions, and we can tailor the overall application features (eg disabling search on sensitive documents) and operational procedures (who can access the index, and how is it monitored, limit the max length of a query) accordingly.