Highlighting API

Hi,

how are PRE and POST tags supposed to work in highlighting?

Assume I use:
pre_Tags : ["tag1", "tag2"],
post_tags : ["tag1", "tag2"],

The documentationhttp://www.elasticsearch.com/docs/elasticsearch/rest_api/search/#Highlightingsays:
There can be a single tag or more, and the “importance” is ordered.

What does it mean?

I did not find any unit test for that in ES code.
Also I think there might be an issue in the code because if I use "styled"
schema then all of my highlighted fragments used only the first tag from the
internal array: i.e.
So the question is when can I expect the other tags being used? (hlt2, hlt3,
...)

Two issues to be fixed:

  1. the internal STYLED_PRE_TAG array contains two entries for hlt2
    (copy&paste?). See
    http://github.com/elasticsearch/elasticsearch/blob/master/modules/elasticsearch/src/main/java/org/elasticsearch/search/highlight/HighlighterParseElement.java#L53
    and this issue propagated into doc as well:
    http://www.elasticsearch.com/docs/elasticsearch/rest_api/search/#Highlighting

  2. the doc
    http://www.elasticsearch.com/docs/elasticsearch/rest_api/search/#Highlighting
    uses
    "tag_schema"
    but the code
    http://github.com/elasticsearch/elasticsearch/blob/master/modules/elasticsearch/src/main/java/org/elasticsearch/search/highlight/HighlighterParseElement.java#L90
    uses
    "tags_schema"

Which one is correct? (I assume the code is right, thus "tags_schema")

Regards,
Lukas

On Wed, Jul 21, 2010 at 5:37 PM, Lukáš Vlček lukas.vlcek@gmail.com wrote:

Hi,

how are PRE and POST tags supposed to work in highlighting?

Assume I use:
pre_Tags : ["tag1", "tag2"],
post_tags : ["tag1", "tag2"],

The documentationhttp://www.elasticsearch.com/docs/elasticsearch/rest_api/search/#Highlightingsays:
There can be a single tag or more, and the “importance” is ordered.

What does it mean?

The Lucene vector highlighter tries to "score" a highlighted hit and that
score is how important it is.

I did not find any unit test for that in ES code.
Also I think there might be an issue in the code because if I use "styled"
schema then all of my highlighted fragments used only the first tag from the
internal array: i.e.
So the question is when can I expect the other tags being used?
(hlt2, hlt3, ...)

Depends on your text and your search query.

Two issues to be fixed:

  1. the internal STYLED_PRE_TAG array contains two entries for hlt2
    (copy&paste?). See
    http://github.com/elasticsearch/elasticsearch/blob/master/modules/elasticsearch/src/main/java/org/elasticsearch/search/highlight/HighlighterParseElement.java#L53
    and this issue propagated into doc as well:
    http://www.elasticsearch.com/docs/elasticsearch/rest_api/search/#Highlighting

Yea, i fixed it in the code, can you fix the docs?

  1. the doc
    http://www.elasticsearch.com/docs/elasticsearch/rest_api/search/#Highlighting
    uses
    "tag_schema"
    but the code
    http://github.com/elasticsearch/elasticsearch/blob/master/modules/elasticsearch/src/main/java/org/elasticsearch/search/highlight/HighlighterParseElement.java#L90
    uses
    "tags_schema"

Which one is correct? (I assume the code is right, thus "tags_schema")

The code is always right ;), its tags_schema, can you fix that in the docs
as well?

Regards,
Lukas

Fixed the docs.

On Wed, Jul 21, 2010 at 7:40 PM, Shay Banon shay.banon@elasticsearch.comwrote:

On Wed, Jul 21, 2010 at 5:37 PM, Lukáš Vlček lukas.vlcek@gmail.comwrote:

Hi,

how are PRE and POST tags supposed to work in highlighting?

Assume I use:
pre_Tags : ["tag1", "tag2"],
post_tags : ["tag1", "tag2"],

The documentationhttp://www.elasticsearch.com/docs/elasticsearch/rest_api/search/#Highlightingsays:
There can be a single tag or more, and the “importance” is ordered.

What does it mean?

The Lucene vector highlighter tries to "score" a highlighted hit and that
score is how important it is.

Let's say I have two pre_tags, then the first one is used if the score is >
0.5 and the second tag is used if score < 0.5 ?
Something like that?

I did not find any unit test for that in ES code.
Also I think there might be an issue in the code because if I use "styled"
schema then all of my highlighted fragments used only the first tag from the
internal array: i.e.
So the question is when can I expect the other tags being used?
(hlt2, hlt3, ...)

Depends on your text and your search query.

Two issues to be fixed:

  1. the internal STYLED_PRE_TAG array contains two entries for hlt2
    (copy&paste?). See
    http://github.com/elasticsearch/elasticsearch/blob/master/modules/elasticsearch/src/main/java/org/elasticsearch/search/highlight/HighlighterParseElement.java#L53
    and this issue propagated into doc as well:
    http://www.elasticsearch.com/docs/elasticsearch/rest_api/search/#Highlighting

Yea, i fixed it in the code, can you fix the docs?

  1. the doc
    http://www.elasticsearch.com/docs/elasticsearch/rest_api/search/#Highlighting
    uses
    "tag_schema"
    but the code
    http://github.com/elasticsearch/elasticsearch/blob/master/modules/elasticsearch/src/main/java/org/elasticsearch/search/highlight/HighlighterParseElement.java#L90
    uses
    "tags_schema"

Which one is correct? (I assume the code is right, thus "tags_schema")

The code is always right ;), its tags_schema, can you fix that in the docs
as well?

Regards,
Lukas

On Wed, Jul 21, 2010 at 8:55 PM, Lukáš Vlček lukas.vlcek@gmail.com wrote:

Fixed the docs.

thanks!

On Wed, Jul 21, 2010 at 7:40 PM, Shay Banon shay.banon@elasticsearch.comwrote:

On Wed, Jul 21, 2010 at 5:37 PM, Lukáš Vlček lukas.vlcek@gmail.comwrote:

Hi,

how are PRE and POST tags supposed to work in highlighting?

Assume I use:
pre_Tags : ["tag1", "tag2"],
post_tags : ["tag1", "tag2"],

The documentationhttp://www.elasticsearch.com/docs/elasticsearch/rest_api/search/#Highlightingsays:
There can be a single tag or more, and the “importance” is ordered.

What does it mean?

The Lucene vector highlighter tries to "score" a highlighted hit and that
score is how important it is.

Let's say I have two pre_tags, then the first one is used if the score is >
0.5 and the second tag is used if score < 0.5 ?
Something like that?

yes, though don't confuse it with search hits scoring.

I did not find any unit test for that in ES code.
Also I think there might be an issue in the code because if I use
"styled" schema then all of my highlighted fragments used only the first tag
from the internal array: i.e.
So the question is when can I expect the other tags being used?
(hlt2, hlt3, ...)

Depends on your text and your search query.

Two issues to be fixed:

  1. the internal STYLED_PRE_TAG array contains two entries for hlt2
    (copy&paste?). See
    http://github.com/elasticsearch/elasticsearch/blob/master/modules/elasticsearch/src/main/java/org/elasticsearch/search/highlight/HighlighterParseElement.java#L53
    and this issue propagated into doc as well:
    http://www.elasticsearch.com/docs/elasticsearch/rest_api/search/#Highlighting

Yea, i fixed it in the code, can you fix the docs?

  1. the doc
    http://www.elasticsearch.com/docs/elasticsearch/rest_api/search/#Highlighting
    uses
    "tag_schema"
    but the code
    http://github.com/elasticsearch/elasticsearch/blob/master/modules/elasticsearch/src/main/java/org/elasticsearch/search/highlight/HighlighterParseElement.java#L90
    uses
    "tags_schema"

Which one is correct? (I assume the code is right, thus "tags_schema")

The code is always right ;), its tags_schema, can you fix that in the docs
as well?

Regards,
Lukas