Ngram and edgeNgram combined for _all field; or different token filters per field for _all


(Sebastian Kurfuerst) #1

Hey everybody,

(this is my first post to this list; let me start with saying ElasticSearch
is a pleasure to work with; so thanks to everybody involved!)

I'm currently stuggeling with the question on how to build my index:
Basically, I have a few fields which contain pretty short "human readable
identifiers" which shall be searched.
Because I have multiple fields per document containing such identifiers,
I'm indexing them in the _all property.

Currently, the _all property is configured with the standard tokenizer, and
the edge NGram token filter -- so people can search for partial words (from
the start of the word) pretty well.
Now, the requirement has arisen to also be able to find any partial word,
not just from the start of each word. However, if the start of the word
matches, it should be higher ranked than a match inside the word.

Any thoughts on how to do that? I currently see different possibilities:

  • can I somehow set different _all indexing configurations for different
    fields? I.e. that the identifiers are indexed as multi_field, once with
    edgeNgram and a higher boost, once with ngram and a lower boost. That'd be
    the solution I'd prefer, but from reading the docs I doubt that's possible.
  • Can I somehow tell the system to use both the edge ngram and the
    ngram filter in parallel, such that the tokens starting from the
    beginning of every word are indexed twice per document? This should, as I
    understand it, also result in a higher ranking, albeit it is somehow crude.
  • Should I kick out the _all field, manually concatenating the different
    strings on indexing time; indexing it once with edgeNgram and once with
    ngram; and then on query time boost the edgeNgram results over the
    other ones? (Would dislike this the most, as this effects every place where
    such queries are built...)

Thank you in advance for providing any advice,

Greets, Sebastian

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #2