We recently updated from 0.90.1 to 0.90.6 and our highlighting tests began
to fail. Updating to 0.90.7 didn't work, so I think there is a bug, or at
least something changed in the specification...
I added a Gist to reproduce the problem:
Here, I am searching for "te" in a field containing "some text to
highlight". The field is folded and tokenized using the ICU plug-in. The
value I expect (and got in earlier versions) is "some text to
highlight", but the returned value is "some textto
highlight". I checked with the three highlighters (I only need the
postings highlighter, but I thought I should check with the others as well).
Now, I'm using the "version": "4.1", though it may cause problems, as it is
written in the post. Also, I forgot to specify a different
"search_analyzer" in my Gist, which explains the match against "to". I
tried to get an example with a minimal configuration and cut that off
I think, however, that the documentation should have a few lines explaining
what you can and cannot expect from highlighting, because it can drive you
crazy.
El martes, 19 de noviembre de 2013 11:52:42 UTC+1, Guillermo Arias del Río
escribió:
Hi, all!
We recently updated from 0.90.1 to 0.90.6 and our highlighting tests began
to fail. Updating to 0.90.7 didn't work, so I think there is a bug, or at
least something changed in the specification...
Here, I am searching for "te" in a field containing "some text to
highlight". The field is folded and tokenized using the ICU plug-in. The
value I expect (and got in earlier versions) is "some text to
highlight", but the returned value is "some textto
highlight". I checked with the three highlighters (I only need the
postings highlighter, but I thought I should check with the others as well).
"Some text to highlight" becomes s, t, t, h . At query time, te becomes t
as well (as you apply the same analyzer at search time too), which is why
you get the second and third token highlighted, makes sense to me.
On Tuesday, November 19, 2013 2:18:38 PM UTC+1, Guillermo Arias del Río
wrote:
I think, however, that the documentation should have a few lines
explaining what you can and cannot expect from highlighting, because it can
drive you crazy.
El martes, 19 de noviembre de 2013 11:52:42 UTC+1, Guillermo Arias del Río
escribió:
Hi, all!
We recently updated from 0.90.1 to 0.90.6 and our highlighting tests
began to fail. Updating to 0.90.7 didn't work, so I think there is a bug,
or at least something changed in the specification...
Here, I am searching for "te" in a field containing "some text to
highlight". The field is folded and tokenized using the ICU plug-in. The
value I expect (and got in earlier versions) is "some text to
highlight", but the returned value is "some textto
highlight". I checked with the three highlighters (I only need the
postings highlighter, but I thought I should check with the others as well).
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.