Exceptions during Highlighting: InvalidTokenOffsetsException

BowlingX · December 13, 2011, 5:06pm

Hello,

im currently working on an autocompletion for a large dataset
(geonames).
My Schema: https://gist.github.com/424ce0205a9a16e7afe1

I've imported a few Countries to run some tests. Most Queries are
successfull, but sometimes the following exception rises:

[...]
Fetch Failed [Failed to highlight field [name.partial]]]; nested:
InvalidTokenOffsetsException[Token dussvitz exceeds length of provided
text sized 7];
[...]

The Query looks like (see my comment in gist):

gist.github.com

https://gist.github.com/BowlingX/424ce0205a9a16e7afe1

gistfile1.json

{
    state: open
    settings: {
     index.analysis.filter.name_ngrams.type: edgeNGram
     index.number_of_replicas: 1
     index.analysis.filter.snowball_de.language: German2
     index.analysis.analyzer.full.filter.2: asciifolding
     index.analysis.analyzer.full.filter.1: lowercase
     index.analysis.analyzer.full.filter.0: standard
     index.analysis.analyzer.full.type: custom

This file has been truncated. show original

Exception ONLY rises when highlighting on Fields like "name.partial or
name.partial_non_ascii or alternateNames.partial etc."

I've no idea anymore and hope that someone can help me out.

BowlingX · December 14, 2011, 10:12am

I found it :), it seems like a bug in "edgeNGram" Filter, i switched
the position of my filters from:

index.analysis.analyzer.partial.filter.3: name_ngrams
index.analysis.analyzer.partial.filter.2: asciifolding
index.analysis.analyzer.partial.filter.1: lowercase
index.analysis.analyzer.partial.filter.0: standard

to

index.analysis.analyzer.partial.filter.3: asciifolding
index.analysis.analyzer.partial.filter.2: name_ngrams
index.analysis.analyzer.partial.filter.1: lowercase
index.analysis.analyzer.partial.filter.0: standard

an it works, could be that problem:

http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201112.mbox/<CAOdYfZU_pe3P-xspsACOhuJwNYTj+=K47uE6a4LYa7=jabB+2A@mail.gmail.com>

https://issues.apache.org/jira/browse/LUCENE-1500

Maybe i should file a bug report

On 13 Dez., 18:06, BowlingX heidrich.da...@googlemail.com wrote:

Hello,

im currently working on an autocompletion for a large dataset
(geonames).
My Schema:Geonames.org Meta Settings and Mapping · GitHub

I've imported a few Countries to run some tests. Most Queries are
successfull, but sometimes the following exception rises:

[...]
Fetch Failed [Failed to highlight field [name.partial]]]; nested:
InvalidTokenOffsetsException[Token dussvitz exceeds length of provided
text sized 7];
[...]

The Query looks like (see my comment in gist):Geonames.org Meta Settings and Mapping · GitHub

Exception ONLY rises when highlighting on Fields like "name.partial or
name.partial_non_ascii or alternateNames.partial etc."

I've no idea anymore and hope that someone can help me out.

kimchy · December 16, 2011, 3:35pm

Nice catch!

On Wed, Dec 14, 2011 at 12:12 PM, BowlingX heidrich.david@googlemail.comwrote:

I found it :), it seems like a bug in "edgeNGram" Filter, i switched
the position of my filters from:

index.analysis.analyzer.partial.filter.3: name_ngrams
index.analysis.analyzer.partial.filter.2: asciifolding
index.analysis.analyzer.partial.filter.1: lowercase
index.analysis.analyzer.partial.filter.0: standard

to

index.analysis.analyzer.partial.filter.3: asciifolding
index.analysis.analyzer.partial.filter.2: name_ngrams
index.analysis.analyzer.partial.filter.1: lowercase
index.analysis.analyzer.partial.filter.0: standard

an it works, could be that problem:

http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201112.mbox/<CAOdYfZU_pe3P-xspsACOhuJwNYTj+=K47uE6a4LYa7=jabB+2A@mail.gmail.com>

[LUCENE-1500] Highlighter throws StringIndexOutOfBoundsException - ASF JIRA

Maybe i should file a bug report

On 13 Dez., 18:06, BowlingX heidrich.da...@googlemail.com wrote:

Hello,

im currently working on an autocompletion for a large dataset
(geonames).
My Schema:Geonames.org Meta Settings and Mapping · GitHub

I've imported a few Countries to run some tests. Most Queries are
successfull, but sometimes the following exception rises:

[...]
Fetch Failed [Failed to highlight field [name.partial]]]; nested:
InvalidTokenOffsetsException[Token dussvitz exceeds length of provided
text sized 7];
[...]

The Query looks like (see my comment in gist):
Geonames.org Meta Settings and Mapping · GitHub

Exception ONLY rises when highlighting on Fields like "name.partial or
name.partial_non_ascii or alternateNames.partial etc."

I've no idea anymore and hope that someone can help me out.

BowlingX · December 16, 2011, 11:23pm

So, is it a bug? Or am I doing anythin wrong?

2011/12/16 Shay Banon kimchy@gmail.com

Nice catch!

On Wed, Dec 14, 2011 at 12:12 PM, BowlingX heidrich.david@googlemail.comwrote:

I found it :), it seems like a bug in "edgeNGram" Filter, i switched
the position of my filters from:

index.analysis.analyzer.partial.filter.3: name_ngrams
index.analysis.analyzer.partial.filter.2: asciifolding
index.analysis.analyzer.partial.filter.1: lowercase
index.analysis.analyzer.partial.filter.0: standard

to

index.analysis.analyzer.partial.filter.3: asciifolding
index.analysis.analyzer.partial.filter.2: name_ngrams
index.analysis.analyzer.partial.filter.1: lowercase
index.analysis.analyzer.partial.filter.0: standard

an it works, could be that problem:

http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201112.mbox/<CAOdYfZU_pe3P-xspsACOhuJwNYTj+=K47uE6a4LYa7=jabB+2A@mail.gmail.com>

[LUCENE-1500] Highlighter throws StringIndexOutOfBoundsException - ASF JIRA

Maybe i should file a bug report

On 13 Dez., 18:06, BowlingX heidrich.da...@googlemail.com wrote:

Hello,

im currently working on an autocompletion for a large dataset
(geonames).
My Schema:Geonames.org Meta Settings and Mapping · GitHub

I've imported a few Countries to run some tests. Most Queries are
successfull, but sometimes the following exception rises:

[...]
Fetch Failed [Failed to highlight field [name.partial]]]; nested:
InvalidTokenOffsetsException[Token dussvitz exceeds length of provided
text sized 7];
[...]

The Query looks like (see my comment in gist):
Geonames.org Meta Settings and Mapping · GitHub

Exception ONLY rises when highlighting on Fields like "name.partial or
name.partial_non_ascii or alternateNames.partial etc."

I've no idea anymore and hope that someone can help me out.

kimchy · December 20, 2011, 3:12pm

I don't know, need to check in Lucene.

On Sat, Dec 17, 2011 at 1:23 AM, David Heidrich <
heidrich.david@googlemail.com> wrote:

So, is it a bug? Or am I doing anythin wrong?

2011/12/16 Shay Banon kimchy@gmail.com

Nice catch!

On Wed, Dec 14, 2011 at 12:12 PM, BowlingX <heidrich.david@googlemail.com

wrote:

I found it :), it seems like a bug in "edgeNGram" Filter, i switched
the position of my filters from:

index.analysis.analyzer.partial.filter.3: name_ngrams
index.analysis.analyzer.partial.filter.2: asciifolding
index.analysis.analyzer.partial.filter.1: lowercase
index.analysis.analyzer.partial.filter.0: standard

to

index.analysis.analyzer.partial.filter.3: asciifolding
index.analysis.analyzer.partial.filter.2: name_ngrams
index.analysis.analyzer.partial.filter.1: lowercase
index.analysis.analyzer.partial.filter.0: standard

an it works, could be that problem:

http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201112.mbox/<CAOdYfZU_pe3P-xspsACOhuJwNYTj+=K47uE6a4LYa7=jabB+2A@mail.gmail.com>

[LUCENE-1500] Highlighter throws StringIndexOutOfBoundsException - ASF JIRA

Maybe i should file a bug report

On 13 Dez., 18:06, BowlingX heidrich.da...@googlemail.com wrote:

Hello,

im currently working on an autocompletion for a large dataset
(geonames).
My Schema:Geonames.org Meta Settings and Mapping · GitHub

I've imported a few Countries to run some tests. Most Queries are
successfull, but sometimes the following exception rises:

[...]
Fetch Failed [Failed to highlight field [name.partial]]]; nested:
InvalidTokenOffsetsException[Token dussvitz exceeds length of provided
text sized 7];
[...]

The Query looks like (see my comment in gist):
Geonames.org Meta Settings and Mapping · GitHub

Exception ONLY rises when highlighting on Fields like "name.partial or
name.partial_non_ascii or alternateNames.partial etc."

I've no idea anymore and hope that someone can help me out.

Kairos · March 6, 2015, 8:52pm

The bug is caused from a wrong calculation of the tokens' offsets. There are some filters that generate additional tokens with a text.length longer than the original token (the one before the (re-)analyzing).
This provokes the wrong calculation of the start and end offsets, ending in a literal hell when trying to highlight.

In fact it will try to add some highlighting tags before the start offset and after the end offset of the token.
But if the token has wrong offset calculated and it is in the end of the field value, then with a high probability the highlighter tries to write the tags over the field length....... raising the exception.

Am having the same problem with an additional plug-in for decompounding German words.

And actually I don't have idea of how to solve it (but I wonder if others filters, like the stemmers, generate longer tokens, and how they manage the offsets correctly).

It's an old post, but maybe can be useful to someone...

Topic		Replies	Views
Highlighting offset exception on 1 shard (of 5). Bad chars? Elasticsearch	1	508	July 6, 2017
Weird highlighting error ... failed to highlight field ... String index out of range Elasticsearch	12	1987	July 6, 2017
Edge Ngram gives bad highlight when using position offsets Elasticsearch	4	2289	July 6, 2017
Highlighting not working for [edge_]ngram with the new versions Elasticsearch	3	1106	July 6, 2017
edgeNGram tokenizer with the word delimiter filter Elasticsearch	2	542	July 6, 2017

Exceptions during Highlighting: InvalidTokenOffsetsException

Related topics