Experience Combining stemmed and not stemmed tokens in the same field


(Alex Roytman) #1

0.9.6-SNAPSHOT implemented dis-junction for tokens with the same offset in
match queries.
so now AND in match queries will be handled well for synonyms or other
analyzers producing multiple tokens with the same offset
before, if using such analyzer at query time it will fail with match/AND
because all the variants of a token will be ANDed

In my case it helped me simplify stemming a lot.

I put both stemmed and original token into the same field (i have couple of
hundreds of small fields searchable independently and contributing to _all
and I would hate to create and maintain mapping for two fields per property
and two _all like multifield and combine them during query time even though
it potentially allows for better boosting )

before I had to use bool query combining search on _all (containing both
stemmed and original fields) with not stemmed and stemmed tokens (not
stemmed were boosted) because AND won't work
consider searching for "blood cells" word in two documents

  1. blood cell: indexed as blood,cell
  2. blood cells: indexed as blood,cells,cell

if match query uses the same analyzer as at indexing, it will logically
make it blood AND cells AND cell and will find only the second document

so I had to do bool or disjunction query combining the same query with
no-stemming analyzer and stemmed (without original words) it worked ok but
had some relevance issues

now with 0.9.6-SNAPSHOT enhancement index time analyzer used in query will
logically produce blood AND (cells or cell) which will find both documents

Relevance overall seems to be better (except few cases where boost on
original token was more useful than boost on phrase query - I combine
match phrase and match AND in my searches)
It is also faster as it simplified my queries

If I could only tell analyzer to give some negative boost to the stemmed
token (if different from original) I think I would have my stemming working
the way I want it

What do you think?

Alex

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #2