Hello there! I have some questions about Attachment field, results
highlighting and suggesting.
Is the Attachment field storing whole file or extracted text only by
default?
If it stores the whole file is there any way to make it store only
extracted text? Or should I extract it with Apache Tika first and then put
it to ES storage for that purpose?
Is it possible to set min and max for number of words that will be
displayed in highlight field?
And almost the same for suggestions. Can I set up phrase
suggestioning? I.e. returning "Green grass", "Green logo", etc. for "Green"
suggestion query.
the attachment stuff, is on the top of my head only.
The original document is stored as base64 inside of the source. which is
also stored on indexing. The field itself is not stored iirc.
You could exclude it from being stored in the source. Using tika as a
preprocessing step is another issue, but might be a good idea, as it does
not require you to restart your whole cluster, in case you would want to
update your tika version for example.
Elasticsearch has three different suggester implementations for
different use-cases. There is a specific phrase suggester, but maybe your
use-case is actually using the completion suggester, see
Hello there! I have some questions about Attachment field, results
highlighting and suggesting.
Is the Attachment field storing whole file or extracted text only
by default?
If it stores the whole file is there any way to make it store only
extracted text? Or should I extract it with Apache Tika first and then put
it to ES storage for that purpose?
Is it possible to set min and max for number of words that will be
displayed in highlight field?
And almost the same for suggestions. Can I set up phrase
suggestioning? I.e. returning "Green grass", "Green logo", etc. for "Green"
suggestion query.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.