Annotated text plugin and index_options/term_vectors

Im using the annotated_text field to index documents but for larger documents Im getting errors:
index has exceeded [1000000] - maximum allowed to be analyzed for highlighting. This maximum can be set by changing the [index.highlight.max_analyzed_offset] index level setting. For large texts, indexing with offsets or term vectors is recommended!.

Ive tried adding "index_options": "offsets" and "term_vector": "with_positions_offsets" but neither seem to have any effect:

PUT test_index
{
    "mappings": {
        "properties": {
            "annotated_with_offsets": {
                "type": "annotated_text",
                "index_options": "offsets"
            },
            "annotated_with_term_vector": {
                "type": "annotated_text",
                "term_vector": "with_positions_offsets"
            },
            "text_with_offsets": {
                "type": "text",
                "index_options": "offsets"
            },
            "text_with_term_vector": {
                "type": "text",
                "term_vector": "with_positions_offsets"
            }
        }
    }
}
GET /test_index/_mapping

{
    "test_index": {
        "mappings": {
            "properties": {
                "annotated_with_offsets": {
                    "type": "annotated_text"
                },
                "annotated_with_term_vector": {
                    "type": "annotated_text"
                },
                "text_with_offsets": {
                    "type": "text",
                    "index_options": "offsets"
                },
                "text_with_term_vector": {
                    "type": "text",
                    "term_vector": "with_positions_offsets"
                }
            }
        }
    }
}

This error is a safeguard kicking in to avoid servers being overloaded by expensive highlighting tasks.
There is some work underway to avoid these errors failing searches and just curtail any highlighting work on individual long docs.

The annotated text field has a custom highlighter that does some special tricks to deal with annotations which sadly means it doesn't use term_vectors so that won't help here. Some potential workarounds:

  • You could try increase the index.highlight.max_analyzed_offset setting (I ended up doing that to look at Wikipedia content).
  • You could run a query with a must_not on the size of the doc to avoid trying to highlight super-large documents and hitting this error.
  • You could break up large documents into smaller ones e.g. instead of indexing whole books, index chapters.