Attachment field questions and some more


(Iv Igi) #1

Hello there! I have some questions about Attachment field, results
highlighting and suggesting.

  1. Is the Attachment field storing whole file or extracted text only by
    default?
  2. If it stores the whole file is there any way to make it store only
    extracted text? Or should I extract it with Apache Tika first and then put
    it to ES storage for that purpose?
  3. Is it possible to set min and max for number of words that will be
    displayed in highlight field?
  4. And almost the same for suggestions. Can I set up phrase
    suggestioning? I.e. returning "Green grass", "Green logo", etc. for "Green"
    suggestion query.

Regards, Iv Igi.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8ef29407-5c30-4d3b-bf3e-b968c82dd9eb%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Alexander Reelsen) #2

Hey,

the attachment stuff, is on the top of my head only.

  1. The original document is stored as base64 inside of the source. which is
    also stored on indexing. The field itself is not stored iirc.
  2. You could exclude it from being stored in the source. Using tika as a
    preprocessing step is another issue, but might be a good idea, as it does
    not require you to restart your whole cluster, in case you would want to
    update your tika version for example.
  3. You can configure the fragment_size, which should help you there, see
    http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-highlighting.html#_highlighted_fragments
  4. Elasticsearch has three different suggester implementations for
    different use-cases. There is a specific phrase suggester, but maybe your
    use-case is actually using the completion suggester, see

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-suggesters.html
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-suggesters-term.html
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-suggesters-phrase.html
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-suggesters-completion.html
http://www.elasticsearch.org/blog/you-complete-me/ (blog post about the
completion suggester).

Hope this helps.

--Alex

On Fri, Jan 24, 2014 at 5:25 AM, Iv Igi sayoneas@gmail.com wrote:

Hello there! I have some questions about Attachment field, results
highlighting and suggesting.

  1. Is the Attachment field storing whole file or extracted text only
    by default?
  2. If it stores the whole file is there any way to make it store only
    extracted text? Or should I extract it with Apache Tika first and then put
    it to ES storage for that purpose?
  3. Is it possible to set min and max for number of words that will be
    displayed in highlight field?
  4. And almost the same for suggestions. Can I set up phrase
    suggestioning? I.e. returning "Green grass", "Green logo", etc. for "Green"
    suggestion query.

Regards, Iv Igi.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/8ef29407-5c30-4d3b-bf3e-b968c82dd9eb%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGCwEM_z97GG4RbbN1S6%3D1R96PGM7Ec%2Be3chTbBa7pMZR8w87w%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #3