Attachment plugin: why does it return extracted text only if asked for "file" field explicitly?

(Vorou) #1

If I ask for file field in my query explicitly, the field file in the response will contain extracted text. If I don't ask for any fields, the field file (in source) will be a subdocument with _name, _content, etc. I don't quite get how does it work, am I missing some basic Elasticsearch knowledge here?

I belive that the extraction isn't happening on each search (as I set "store": true for the file field), but there is no extracted text anywhere in _source. Where is it stored?

PS: By the way, if I set "store": false, will Elastic re-extract text from base64 on each search request?

(David Pilato) #2

The source is never modified by elasticsearch. Other sub fields are generated and indexed at index time.

(Vorou) #3

Thanks! Your words pointed me into right direction.

For anyone struggling with the same question, the post which cleared it up for me.

Could you please answer the PS part? If I set "store": false for a field of type attachment, will attachment plugin re-extract text from base64 from _source on each search request?

(David Pilato) #4

You can set it to false unless you need to highlight the text.

(system) #5