Confusion on what is actually stored per field

I got quite confused today doing some testing (Java client) around what is
actually stored in the index for a field. Before investing into a REST gist
I just wanted to check whether there is a major misunderstanding on my side:

  • I have a type mapping with _source = disabled and dynamic = false.
  • I have index = analyzed and store = no for my field
  • I index some content in the field
  • I run a get request for the indexed content and configured the get
    request to return the field

I checked that the mapping I defined is really in effect on the index (so
no basic misconfiguration). I expected the field to be null on the get
response (as it is not stored). If not null, I expected at least only the
analyzed version (e.g. lowercase which is my custom test analyzer).
However, what I am getting is always the real (non-analyzed) value? Is
this correct behavior?

I see whats going on. What happens is that when you do a get, its a
realtime get that is read from the transaction log (as you index more data,
that will move into getting the doc from the "index"). And, even if _source
is disabled, it is still stored in the transaction log (so we can replay
it). Opened an issue so we will be more consistent:
Get API: When _source is disabled, the source is still used if fetched from the transaction log · Issue #1927 · elastic/elasticsearch · GitHub.

On Tue, May 8, 2012 at 4:29 PM, Jan Fiedler fiedler.jan@gmail.com wrote:

I got quite confused today doing some testing (Java client) around what is
actually stored in the index for a field. Before investing into a REST gist
I just wanted to check whether there is a major misunderstanding on my side:

  • I have a type mapping with _source = disabled and dynamic = false.
  • I have index = analyzed and store = no for my field
  • I index some content in the field
  • I run a get request for the indexed content and configured the get
    request to return the field

I checked that the mapping I defined is really in effect on the index (so
no basic misconfiguration). I expected the field to be null on the get
response (as it is not stored). If not null, I expected at least only the
analyzed version (e.g. lowercase which is my custom test analyzer).
However, what I am getting is always the real (non-analyzed) value? Is
this correct behavior?

Awesome! Thanks for getting back so quickly. The background to this
question / test was that I plan to implement a special 'hashing' analyzer
for security relevant fields (e.g. stuff like social security numbers). I
am trying to provide some security for such fields (in case the index gets
compromised / stolen) while still providing some basic (very) search
capabilities on those fields. Obviously, for this you must not be able to
access the original field value (which is why I was so surprised in my test
case).