If I ask for file field in my query explicitly, the field file in the response will contain extracted text. If I don't ask for any fields, the field file (in source) will be a subdocument with _name, _content, etc. I don't quite get how does it work, am I missing some basic Elasticsearch knowledge here?
I belive that the extraction isn't happening on each search (as I set "store": true for the file field), but there is no extracted text anywhere in _source. Where is it stored?
PS: By the way, if I set "store": false, will Elastic re-extract text from base64 on each search request?
Could you please answer the PS part? If I set "store": false for a field of type attachment, will attachment plugin re-extract text from base64 from _source on each search request?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.