Proximity and phrases search fast-vector-highlighter vs. highlighter + exposing highlighter


(Tomislav Poljak) #1

Hi,
here https://gist.github.com/1222046 is a bug reconstruction for
highlighting issues with fast-vector-highlighter in cases of proximity
search and exact phrases searching (when exact phrases contain stop
words).

Reconstruction shows two cases where default/plain highlighter works
correctly/better then the fast-vector-highlighter (maybe there are
some cases/queries where situation is reversed), so my question is: if
these bugs in fast-vector-highlighter code require some time to be
fixed maybe it would be useful to expose highlighter (as a quick fix),
so in the case where field is stored (with "term_vector" :
"with_positions_offsets") users can chose between using
fast-vector-highlighter and default/plain highlighter. Because, even
if fast-vector-highlighter is much faster and should be used for
highlighting matches in fields with term vectors stored, in some cases
(like proximity and stop-word phrases) it does not work correctly, so
being able to use plain highlighter would help.

Thanks,

Tomislav


(Robert-2) #2

I was wondering if you ever found a way to correctly use fast vector
highlighting with proximity searches, I am running into the same problem.

Thanks,
Robert

On Friday, September 16, 2011 9:22:24 AM UTC-4, Tomislav Poljak wrote:

Hi,
here https://gist.github.com/1222046 is a bug reconstruction for
highlighting issues with fast-vector-highlighter in cases of proximity
search and exact phrases searching (when exact phrases contain stop
words).

Reconstruction shows two cases where default/plain highlighter works
correctly/better then the fast-vector-highlighter (maybe there are
some cases/queries where situation is reversed), so my question is: if
these bugs in fast-vector-highlighter code require some time to be
fixed maybe it would be useful to expose highlighter (as a quick fix),
so in the case where field is stored (with "term_vector" :
"with_positions_offsets") users can chose between using
fast-vector-highlighter and default/plain highlighter. Because, even
if fast-vector-highlighter is much faster and should be used for
highlighting matches in fields with term vectors stored, in some cases
(like proximity and stop-word phrases) it does not work correctly, so
being able to use plain highlighter would help.

Thanks,

Tomislav


(Shay Banon) #3

I need to chase it down with the actual implementation of the fast vector
highlighter, not too difficult, just some time consuming, can you open an
issue and we can chase it down there?

On Fri, Sep 16, 2011 at 3:22 PM, Tomislav Poljak tpoljak@gmail.com wrote:

Hi,
here https://gist.github.com/1222046 is a bug reconstruction for
highlighting issues with fast-vector-highlighter in cases of proximity
search and exact phrases searching (when exact phrases contain stop
words).

Reconstruction shows two cases where default/plain highlighter works
correctly/better then the fast-vector-highlighter (maybe there are
some cases/queries where situation is reversed), so my question is: if
these bugs in fast-vector-highlighter code require some time to be
fixed maybe it would be useful to expose highlighter (as a quick fix),
so in the case where field is stored (with "term_vector" :
"with_positions_offsets") users can chose between using
fast-vector-highlighter and default/plain highlighter. Because, even
if fast-vector-highlighter is much faster and should be used for
highlighting matches in fields with term vectors stored, in some cases
(like proximity and stop-word phrases) it does not work correctly, so
being able to use plain highlighter would help.

Thanks,

Tomislav


(Robert-2) #4

Created Issue - https://github.com/elasticsearch/elasticsearch/issues/1986

On Tuesday, May 29, 2012 3:43:47 PM UTC-4, kimchy wrote:

I need to chase it down with the actual implementation of the fast vector
highlighter, not too difficult, just some time consuming, can you open an
issue and we can chase it down there?

On Fri, Sep 16, 2011 at 3:22 PM, Tomislav Poljak wrote:

Hi,
here https://gist.github.com/1222046 is a bug reconstruction for
highlighting issues with fast-vector-highlighter in cases of proximity
search and exact phrases searching (when exact phrases contain stop
words).

Reconstruction shows two cases where default/plain highlighter works
correctly/better then the fast-vector-highlighter (maybe there are
some cases/queries where situation is reversed), so my question is: if
these bugs in fast-vector-highlighter code require some time to be
fixed maybe it would be useful to expose highlighter (as a quick fix),
so in the case where field is stored (with "term_vector" :
"with_positions_offsets") users can chose between using
fast-vector-highlighter and default/plain highlighter. Because, even
if fast-vector-highlighter is much faster and should be used for
highlighting matches in fields with term vectors stored, in some cases
(like proximity and stop-word phrases) it does not work correctly, so
being able to use plain highlighter would help.

Thanks,

Tomislav


(system) #5