I've imported a few Countries to run some tests. Most Queries are
successfull, but sometimes the following exception rises:
[...]
Fetch Failed [Failed to highlight field [name.partial]]]; nested:
InvalidTokenOffsetsException[Token dussvitz exceeds length of provided
text sized 7];
[...]
The Query looks like (see my comment in gist):
Exception ONLY rises when highlighting on Fields like "name.partial or
name.partial_non_ascii or alternateNames.partial etc."
I've no idea anymore and hope that someone can help me out.
I've imported a few Countries to run some tests. Most Queries are
successfull, but sometimes the following exception rises:
[...]
Fetch Failed [Failed to highlight field [name.partial]]]; nested:
InvalidTokenOffsetsException[Token dussvitz exceeds length of provided
text sized 7];
[...]
I've imported a few Countries to run some tests. Most Queries are
successfull, but sometimes the following exception rises:
[...]
Fetch Failed [Failed to highlight field [name.partial]]]; nested:
InvalidTokenOffsetsException[Token dussvitz exceeds length of provided
text sized 7];
[...]
I've imported a few Countries to run some tests. Most Queries are
successfull, but sometimes the following exception rises:
[...]
Fetch Failed [Failed to highlight field [name.partial]]]; nested:
InvalidTokenOffsetsException[Token dussvitz exceeds length of provided
text sized 7];
[...]
I've imported a few Countries to run some tests. Most Queries are
successfull, but sometimes the following exception rises:
[...]
Fetch Failed [Failed to highlight field [name.partial]]]; nested:
InvalidTokenOffsetsException[Token dussvitz exceeds length of provided
text sized 7];
[...]
The bug is caused from a wrong calculation of the tokens' offsets. There are some filters that generate additional tokens with a text.length longer than the original token (the one before the (re-)analyzing).
This provokes the wrong calculation of the start and end offsets, ending in a literal hell when trying to highlight.
In fact it will try to add some highlighting tags before the start offset and after the end offset of the token.
But if the token has wrong offset calculated and it is in the end of the field value, then with a high probability the highlighter tries to write the tags over the field length....... raising the exception.
Am having the same problem with an additional plug-in for decompounding German words.
And actually I don't have idea of how to solve it (but I wonder if others filters, like the stemmers, generate longer tokens, and how they manage the offsets correctly).
It's an old post, but maybe can be useful to someone...
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.