Highlighted query has hits but misses highlighted fragments (for some documents)


(bdonnovan) #1

Hey there,

i am facing somewhat awkward results when trying to query a highlight
search. I query for a simple term and i get a result set, but just
some of the results contain a highlighted fragment, which is what i do
not understand. The results are generally all correct, even the
results with no highlighted fragments supplied do indeed contain the
queried term, i just don't know what the reason could be for not
giving me the highlighted fragement of it. The source is stored as
well for all these documents. Interestingly i get highlighted
fragments for the same documents if i choose a different term from
that document. I just don't see a pattern here. What could be other
reasons for that ? I am fairly new to ES, so maybe this is a
nobrainer , but am not able to fix this for two days now.

This is the query:

curl -XGET 'http://localhost:9200/docs/doc/_search?pretty=true' -d '{

"from" : 0,
"size" : 10,
"query" : {
"term" : {
"fulltext" : "testen"
}
},
"explain" : true,
"fields" : [ "author", "title", "inserted" ],
"highlight" : {
"pre_tags" : [ "" ],
"post_tags" : [ "
" ],
"fields" : {
"fulltext" : {
"fragment_size" : 100
}
}
}
}'

And the results can be found here https://gist.github.com/1986656


(Shay Banon) #2

Can you gist a recreation with sample documents indexed? (see http://www.elasticsearch.org/help).

On Tuesday, March 6, 2012 at 5:06 PM, bdonnovan wrote:

Hey there,

i am facing somewhat awkward results when trying to query a highlight
search. I query for a simple term and i get a result set, but just
some of the results contain a highlighted fragment, which is what i do
not understand. The results are generally all correct, even the
results with no highlighted fragments supplied do indeed contain the
queried term, i just don't know what the reason could be for not
giving me the highlighted fragement of it. The source is stored as
well for all these documents. Interestingly i get highlighted
fragments for the same documents if i choose a different term from
that document. I just don't see a pattern here. What could be other
reasons for that ? I am fairly new to ES, so maybe this is a
nobrainer , but am not able to fix this for two days now.

This is the query:

curl -XGET 'http://localhost:9200/docs/doc/_search?pretty=true' -d '{

"from" : 0,
"size" : 10,
"query" : {
"term" : {
"fulltext" : "testen"
}
},
"explain" : true,
"fields" : [ "author", "title", "inserted" ],
"highlight" : {
"pre_tags" : [ "" ],
"post_tags" : [ "
" ],
"fields" : {
"fulltext" : {
"fragment_size" : 100
}
}
}
}'

And the results can be found here https://gist.github.com/1986656


(bdonnovan) #3

i set up a simple sample application that reproduces it.

Code, mapping and one of the sample documents is gisted here

The parser seems to stop parsing the document at a certain point to
fetch the highlighted fragment. (line 987 actually)
i don't see any special characters or anything and no warnings or
errors, but also no fragment for ocurrences after that point.

I am using v18.7

And thanks a lot for looking into it, i can't seem to make any
progress on the issue.

On 6 Mrz., 21:27, Shay Banon kim...@gmail.com wrote:

Can you gist a recreation with sample documents indexed? (seehttp://www.elasticsearch.org/help).

On Tuesday, March 6, 2012 at 5:06 PM, bdonnovan wrote:

Hey there,

i am facing somewhat awkward results when trying to query a highlight
search. I query for a simple term and i get a result set, but just
some of the results contain a highlighted fragment, which is what i do
not understand. The results are generally all correct, even the
results with no highlighted fragments supplied do indeed contain the
queried term, i just don't know what the reason could be for not
giving me the highlighted fragement of it. The source is stored as
well for all these documents. Interestingly i get highlighted
fragments for the same documents if i choose a different term from
that document. I just don't see a pattern here. What could be other
reasons for that ? I am fairly new to ES, so maybe this is a
nobrainer , but am not able to fix this for two days now.

This is the query:

curl -XGET 'http://localhost:9200/docs/doc/_search?pretty=true'-d '{

"from" : 0,
"size" : 10,
"query" : {
"term" : {
"fulltext" : "testen"
}
},
"explain" : true,
"fields" : [ "author", "title", "inserted" ],
"highlight" : {
"pre_tags" : [ "" ],
"post_tags" : [ "
" ],
"fields" : {
"fulltext" : {
"fragment_size" : 100
}
}
}
}'

And the results can be found herehttps://gist.github.com/1986656


(bdonnovan) #4

this is driving me nuts, i can see within luke that the field is
properly stored. (which is not even necessary how i understand, as the
source would be enough to get a fragment for the hit, right?)
so why on earth would elasticsearch refuse to give me the fragments
for the hit ?

On 7 Mrz., 11:34, bdonnovan bdonno...@googlemail.com wrote:

i set up a simple sample application that reproduces it.

Code, mapping and one of the sample documents is gisted herehttps://gist.github.com/1992409

The parser seems to stop parsing the document at a certain point to
fetch thehighlightedfragment. (line 987 actually)
i don't see any special characters or anything and no warnings or
errors, but also no fragment for ocurrences after that point.

I am using v18.7

And thanks a lot for looking into it, i can't seem to make any
progress on the issue.

On 6 Mrz., 21:27, Shay Banon kim...@gmail.com wrote:

Can you gist a recreation with sample documents indexed? (seehttp://www.elasticsearch.org/help).

On Tuesday, March 6, 2012 at 5:06 PM, bdonnovan wrote:

Hey there,

i am facing somewhat awkward results when trying to query a highlight
search. I query for a simple term and i get a result set, but just
some of the results contain ahighlightedfragment, which is what i do
not understand. The results are generally all correct, even the
results with nohighlightedfragments supplied do indeed contain the
queried term, i just don't know what the reason could be for not
giving me thehighlightedfragement of it. The source is stored as
well for all these documents. Interestingly i gethighlighted
fragments for the same documents if i choose a different term from
that document. I just don't see a pattern here. What could be other
reasons for that ? I am fairly new to ES, so maybe this is a
nobrainer , but am not able to fix this for two days now.

This is the query:

curl -XGET 'http://localhost:9200/docs/doc/_search?pretty=true'-d'{

"from" : 0,
"size" : 10,
"query" : {
"term" : {
"fulltext" : "testen"
}
},
"explain" : true,
"fields" : [ "author", "title", "inserted" ],
"highlight" : {
"pre_tags" : [ "" ],
"post_tags" : [ "
" ],
"fields" : {
"fulltext" : {
"fragment_size" : 100
}
}
}
}'

And the results can be found herehttps://gist.github.com/1986656


(Shay Banon) #5

It is strange…, I haven't run it it, will try and do it over the weekend. Just to verify that its not a problem with the highlighter, can you try and store term vector in the mappings for the field? It will use a different highlighter in this case (the fast vector highlighter), it would be interesting to know if its highlighting in this case.

On Thursday, March 8, 2012 at 1:44 AM, bdonnovan wrote:

this is driving me nuts, i can see within luke that the field is
properly stored. (which is not even necessary how i understand, as the
source would be enough to get a fragment for the hit, right?)
so why on earth would elasticsearch refuse to give me the fragments
for the hit ?

On 7 Mrz., 11:34, bdonnovan <bdonno...@googlemail.com (http://googlemail.com)> wrote:

i set up a simple sample application that reproduces it.

Code, mapping and one of the sample documents is gisted herehttps://gist.github.com/1992409 (http://gist.github.com/1992409)

The parser seems to stop parsing the document at a certain point to
fetch thehighlightedfragment. (line 987 actually)
i don't see any special characters or anything and no warnings or
errors, but also no fragment for ocurrences after that point.

I am using v18.7

And thanks a lot for looking into it, i can't seem to make any
progress on the issue.

On 6 Mrz., 21:27, Shay Banon <kim...@gmail.com (http://gmail.com)> wrote:

Can you gist a recreation with sample documents indexed? (seehttp://www.elasticsearch.org/help).

On Tuesday, March 6, 2012 at 5:06 PM, bdonnovan wrote:

Hey there,

i am facing somewhat awkward results when trying to query a highlight
search. I query for a simple term and i get a result set, but just
some of the results contain ahighlightedfragment, which is what i do
not understand. The results are generally all correct, even the
results with nohighlightedfragments supplied do indeed contain the
queried term, i just don't know what the reason could be for not
giving me thehighlightedfragement of it. The source is stored as
well for all these documents. Interestingly i gethighlighted
fragments for the same documents if i choose a different term from
that document. I just don't see a pattern here. What could be other
reasons for that ? I am fairly new to ES, so maybe this is a
nobrainer , but am not able to fix this for two days now.

This is the query:

curl -XGET 'http://localhost:9200/docs/doc/_search?pretty=true'-d'{

"from" : 0,
"size" : 10,
"query" : {
"term" : {
"fulltext" : "testen"
}
},
"explain" : true,
"fields" : [ "author", "title", "inserted" ],
"highlight" : {
"pre_tags" : [ "" ],
"post_tags" : [ "
" ],
"fields" : {
"fulltext" : {
"fragment_size" : 100
}
}
}
}'

And the results can be found herehttps://gist.github.com/1986656 (http://gist.github.com/1986656)


(Rémi Montagu) #6

I have the same problem with the latest version of ES.

The field "fulltext" contains the text of a French book of 1000 pages.
If I search with a term at the beginning of this book, ES returns the
document with the field "highlight".
If I search with a term at the end of this book, ES returns the document
but without the field "highlight".

Le vendredi 9 mars 2012 19:40:39 UTC+1, kimchy a écrit :

It is strange…, I haven't run it it, will try and do it over the weekend.
Just to verify that its not a problem with the highlighter, can you try and
store term vector in the mappings for the field? It will use a different
highlighter in this case (the fast vector highlighter), it would be
interesting to know if its highlighting in this case.

On Thursday, March 8, 2012 at 1:44 AM, bdonnovan wrote:

this is driving me nuts, i can see within luke that the field is
properly stored. (which is not even necessary how i understand, as the
source would be enough to get a fragment for the hit, right?)
so why on earth would elasticsearch refuse to give me the fragments
for the hit ?

On 7 Mrz., 11:34, bdonnovan bdonno...@googlemail.com wrote:

i set up a simple sample application that reproduces it.

Code, mapping and one of the sample documents is gisted herehttps://
gist.github.com/1992409

The parser seems to stop parsing the document at a certain point to
fetch thehighlightedfragment. (line 987 actually)
i don't see any special characters or anything and no warnings or
errors, but also no fragment for ocurrences after that point.

I am using v18.7

And thanks a lot for looking into it, i can't seem to make any
progress on the issue.

On 6 Mrz., 21:27, Shay Banon kim...@gmail.com wrote:

Can you gist a recreation with sample documents indexed? (seehttp://
www.elasticsearch.org/help).

On Tuesday, March 6, 2012 at 5:06 PM, bdonnovan wrote:

Hey there,

i am facing somewhat awkward results when trying to query a highlight
search. I query for a simple term and i get a result set, but just
some of the results contain ahighlightedfragment, which is what i do
not understand. The results are generally all correct, even the
results with nohighlightedfragments supplied do indeed contain the
queried term, i just don't know what the reason could be for not
giving me thehighlightedfragement of it. The source is stored as
well for all these documents. Interestingly i gethighlighted
fragments for the same documents if i choose a different term from
that document. I just don't see a pattern here. What could be other
reasons for that ? I am fairly new to ES, so maybe this is a
nobrainer , but am not able to fix this for two days now.

This is the query:

curl -XGET 'http://localhost:9200/docs/doc/_search?pretty=true'-d'{

"from" : 0,
"size" : 10,
"query" : {
"term" : {
"fulltext" : "testen"
}
},
"explain" : true,
"fields" : [ "author", "title", "inserted" ],
"highlight" : {
"pre_tags" : [ "" ],
"post_tags" : [ "
" ],
"fields" : {
"fulltext" : {
"fragment_size" : 100
}
}
}
}'

And the results can be found herehttps://gist.github.com/1986656


(Rémi Montagu) #7

I tried to store term vector in the mapping.
With this configuration, I do have more results in the highlight field.

curl -XPUT 'localhost:9200/fr/notices/_mapping' -d '
{
"notices": {
"properties": {
"fulltext": {
"type": "string",
"store": "yes",
"term_vector": "with_positions_offsets"
}
}
}
}
'

Thank you kimchy.

Le mercredi 14 mars 2012 09:05:07 UTC+1, Rémi Montagu a écrit :

I have the same problem with the latest version of ES.
https://gist.github.com/2034954

The field "fulltext" contains the text of a French book of 1000 pages.
If I search with a term at the beginning of this book, ES returns the
document with the field "highlight".
If I search with a term at the end of this book, ES returns the document
but without the field "highlight".

Le vendredi 9 mars 2012 19:40:39 UTC+1, kimchy a écrit :

It is strange…, I haven't run it it, will try and do it over the
weekend. Just to verify that its not a problem with the highlighter, can
you try and store term vector in the mappings for the field? It will use a
different highlighter in this case (the fast vector highlighter), it would
be interesting to know if its highlighting in this case.

On Thursday, March 8, 2012 at 1:44 AM, bdonnovan wrote:

this is driving me nuts, i can see within luke that the field is
properly stored. (which is not even necessary how i understand, as the
source would be enough to get a fragment for the hit, right?)
so why on earth would elasticsearch refuse to give me the fragments
for the hit ?

On 7 Mrz., 11:34, bdonnovan bdonno...@googlemail.com wrote:

i set up a simple sample application that reproduces it.

Code, mapping and one of the sample documents is gisted herehttps://
gist.github.com/1992409

The parser seems to stop parsing the document at a certain point to
fetch thehighlightedfragment. (line 987 actually)
i don't see any special characters or anything and no warnings or
errors, but also no fragment for ocurrences after that point.

I am using v18.7

And thanks a lot for looking into it, i can't seem to make any
progress on the issue.

On 6 Mrz., 21:27, Shay Banon kim...@gmail.com wrote:

Can you gist a recreation with sample documents indexed? (seehttp://
www.elasticsearch.org/help).

On Tuesday, March 6, 2012 at 5:06 PM, bdonnovan wrote:

Hey there,

i am facing somewhat awkward results when trying to query a highlight
search. I query for a simple term and i get a result set, but just
some of the results contain ahighlightedfragment, which is what i do
not understand. The results are generally all correct, even the
results with nohighlightedfragments supplied do indeed contain the
queried term, i just don't know what the reason could be for not
giving me thehighlightedfragement of it. The source is stored as
well for all these documents. Interestingly i gethighlighted
fragments for the same documents if i choose a different term from
that document. I just don't see a pattern here. What could be other
reasons for that ? I am fairly new to ES, so maybe this is a
nobrainer , but am not able to fix this for two days now.

This is the query:

curl -XGET 'http://localhost:9200/docs/doc/_search?pretty=true'-d'{

"from" : 0,
"size" : 10,
"query" : {
"term" : {
"fulltext" : "testen"
}
},
"explain" : true,
"fields" : [ "author", "title", "inserted" ],
"highlight" : {
"pre_tags" : [ "" ],
"post_tags" : [ "
" ],
"fields" : {
"fulltext" : {
"fragment_size" : 100
}
}
}
}'

And the results can be found herehttps://gist.github.com/1986656


(bdonnovan) #8

thanks for the reply !

yes switching to term vector did the trick for me as well, i just
assumed it should also work without the term vector, which it still
doesn't. so i can live with that, but the default highlighter might
need a fix there.

greetings, brian

On 9 Mrz., 19:40, Shay Banon kim...@gmail.com wrote:

It is strange…, I haven't run it it, will try and do it over the weekend. Just to verify that its not a problem with the highlighter, can you try and store term vector in the mappings for the field? It will use a different highlighter in this case (the fast vector highlighter), it would be interesting to know if its highlighting in this case.

On Thursday, March 8, 2012 at 1:44 AM, bdonnovan wrote:

this is driving me nuts, i can see within luke that the field is
properly stored. (which is not even necessary how i understand, as the
source would be enough to get a fragment for the hit, right?)
so why on earth would elasticsearch refuse to give me the fragments
for the hit ?

On 7 Mrz., 11:34, bdonnovan <bdonno...@googlemail.com (http://googlemail.com)> wrote:

i set up a simple sample application that reproduces it.

Code, mapping and one of the sample documents is gisted herehttps://gist.github.com/1992409(http://gist.github.com/1992409)

The parser seems to stop parsing the document at a certain point to
fetch thehighlightedfragment. (line 987 actually)
i don't see any special characters or anything and no warnings or
errors, but also no fragment for ocurrences after that point.

I am using v18.7

And thanks a lot for looking into it, i can't seem to make any
progress on the issue.

On 6 Mrz., 21:27, Shay Banon <kim...@gmail.com (http://gmail.com)> wrote:

Can you gist a recreation with sample documents indexed? (seehttp://www.elasticsearch.org/help).

On Tuesday, March 6, 2012 at 5:06 PM, bdonnovan wrote:

Hey there,

i am facing somewhat awkward results when trying to query a highlight
search. I query for a simple term and i get a result set, but just
some of the results contain ahighlightedfragment, which is what i do
not understand. The results are generally all correct, even the
results with nohighlightedfragments supplied do indeed contain the
queried term, i just don't know what the reason could be for not
giving me thehighlightedfragement of it. The source is stored as
well for all these documents. Interestingly i gethighlighted
fragments for the same documents if i choose a different term from
that document. I just don't see a pattern here. What could be other
reasons for that ? I am fairly new to ES, so maybe this is a
nobrainer , but am not able to fix this for two days now.

This is the query:

curl -XGET 'http://localhost:9200/docs/doc/_search?pretty=true'-d'{

"from" : 0,
"size" : 10,
"query" : {
"term" : {
"fulltext" : "testen"
}
},
"explain" : true,
"fields" : [ "author", "title", "inserted" ],
"highlight" : {
"pre_tags" : [ "" ],
"post_tags" : [ "
" ],
"fields" : {
"fulltext" : {
"fragment_size" : 100
}
}
}
}'

And the results can be found herehttps://gist.github.com/1986656(http://gist.github.com/1986656)


(Shay Banon) #9

Ok, I tracked it down, and there is a limit to the number of chars
highlighted using the "default" highlighter, opened an issue to change
that: https://github.com/elasticsearch/elasticsearch/issues/1796.

On Mon, Mar 19, 2012 at 9:52 PM, bdonnovan bdonnovan@googlemail.com wrote:

thanks for the reply !

yes switching to term vector did the trick for me as well, i just
assumed it should also work without the term vector, which it still
doesn't. so i can live with that, but the default highlighter might
need a fix there.

greetings, brian

On 9 Mrz., 19:40, Shay Banon kim...@gmail.com wrote:

It is strange…, I haven't run it it, will try and do it over the
weekend. Just to verify that its not a problem with the highlighter, can
you try and store term vector in the mappings for the field? It will use a
different highlighter in this case (the fast vector highlighter), it would
be interesting to know if its highlighting in this case.

On Thursday, March 8, 2012 at 1:44 AM, bdonnovan wrote:

this is driving me nuts, i can see within luke that the field is
properly stored. (which is not even necessary how i understand, as the
source would be enough to get a fragment for the hit, right?)
so why on earth would elasticsearch refuse to give me the fragments
for the hit ?

On 7 Mrz., 11:34, bdonnovan <bdonno...@googlemail.com (
http://googlemail.com)> wrote:

i set up a simple sample application that reproduces it.

Code, mapping and one of the sample documents is gisted herehttps://
gist.github.com/1992409(http://gist.github.com/1992409)

The parser seems to stop parsing the document at a certain point to
fetch thehighlightedfragment. (line 987 actually)
i don't see any special characters or anything and no warnings or
errors, but also no fragment for ocurrences after that point.

I am using v18.7

And thanks a lot for looking into it, i can't seem to make any
progress on the issue.

On 6 Mrz., 21:27, Shay Banon <kim...@gmail.com (http://gmail.com)>
wrote:

Can you gist a recreation with sample documents indexed?
(seehttp://www.elasticsearch.org/help).

On Tuesday, March 6, 2012 at 5:06 PM, bdonnovan wrote:

Hey there,

i am facing somewhat awkward results when trying to query a
highlight

search. I query for a simple term and i get a result set, but
just

some of the results contain ahighlightedfragment, which is what
i do

not understand. The results are generally all correct, even the
results with nohighlightedfragments supplied do indeed contain
the

queried term, i just don't know what the reason could be for not
giving me thehighlightedfragement of it. The source is stored as
well for all these documents. Interestingly i gethighlighted
fragments for the same documents if i choose a different term
from

that document. I just don't see a pattern here. What could be
other

reasons for that ? I am fairly new to ES, so maybe this is a
nobrainer , but am not able to fix this for two days now.

This is the query:

curl -XGET '
http://localhost:9200/docs/doc/_search?pretty=true'-d'{

"from" : 0,
"size" : 10,
"query" : {
"term" : {
"fulltext" : "testen"
}
},
"explain" : true,
"fields" : [ "author", "title", "inserted" ],
"highlight" : {
"pre_tags" : [ "" ],
"post_tags" : [ "
" ],
"fields" : {
"fulltext" : {
"fragment_size" : 100
}
}
}
}'

And the results can be found herehttps://
gist.github.com/1986656(http://gist.github.com/1986656)


(system) #10