Highlight not always shown


(max) #1

Hi,

we are using elasticsearch for indexing a lot of html-documents (right
now around 90k, indexsize is around 2GB). everything is great, we love
elasticsearch!

but we noticed one thing we cannot understand. we use the following
search:
{
"from" : 0, "size" : 10,
"sort" : [
{"time" : "desc"}
],

    "query" : {
                      "filtered" : {
                           "query" : {
                                    "query_string" : {
                                            "fields" :

["title","text"],
"query" :
"",
"default_operator" :
"AND"
}
},
"filter" : {
"and" : [
{
"term" :
{"user" : ""}
}
]
}
}
},
"highlight" : {
"pre_tags" : [""],
"post_tags" : ["
"],
"fields" : {
"text" : {"fragment_size" : 100,
"number_of_fragments" : 1}
}
}
}

the search finds the right documents, but the highlight-field is
sometimes filled, sometimes empty (the text-field ist stored as a
field too, as we had to do it before 0.14.0). if i search for
different words and find the same documents for some words the
highlight is there, for some words not (in the same document we find).

any ideas why?

Thanks a lot audn keep up with the great work!
MAX


(Shay Banon) #2

Heya,

Its a bug fixed in master / 0.14 branch.

-shay.banon
On Tuesday, January 18, 2011 at 2:48 PM, max wrote:

Hi,

we are using elasticsearch for indexing a lot of html-documents (right
now around 90k, indexsize is around 2GB). everything is great, we love
elasticsearch!

but we noticed one thing we cannot understand. we use the following
search:
{
"from" : 0, "size" : 10,
"sort" : [
{"time" : "desc"}
],

"query" : {
"filtered" : {
"query" : {
"query_string" : {
"fields" :
["title","text"],
"query" :
"",
"default_operator" :
"AND"
}
},
"filter" : {
"and" : [
{
"term" :
{"user" : ""}
}
]
}
}
},
"highlight" : {
"pre_tags" : [""],
"post_tags" : ["
"],
"fields" : {
"text" : {"fragment_size" : 100,
"number_of_fragments" : 1}
}
}
}

the search finds the right documents, but the highlight-field is
sometimes filled, sometimes empty (the text-field ist stored as a
field too, as we had to do it before 0.14.0). if i search for
different words and find the same documents for some words the
highlight is there, for some words not (in the same document we find).

any ideas why?

Thanks a lot audn keep up with the great work!
MAX


(max) #3

hi,

thank you for the fast response!

we did a "./gradlew clean release" and installed this 0.15.0-SNAPSHOT
and still have the same problem. sometimes the highlight is there,
sometimes not, any ideas why?

thank you very much,
MAX

On Jan 18, 7:26 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Heya,

Its a bug fixed in master / 0.14 branch.

-shay.banon

On Tuesday, January 18, 2011 at 2:48 PM, max wrote:

Hi,

we are using elasticsearch for indexing a lot of html-documents (right
now around 90k, indexsize is around 2GB). everything is great, we love
elasticsearch!

but we noticed one thing we cannot understand. we use the following
search:
{
"from" : 0, "size" : 10,
"sort" : [
{"time" : "desc"}
],

"query" : {
"filtered" : {
"query" : {
"query_string" : {
"fields" :
["title","text"],
"query" :
"",
"default_operator" :
"AND"
}
},
"filter" : {
"and" : [
{
"term" :
{"user" : ""}
}
]
}
}
},
"highlight" : {
"pre_tags" : [""],
"post_tags" : ["
"],
"fields" : {
"text" : {"fragment_size" : 100,
"number_of_fragments" : 1}
}
}
}

the search finds the right documents, but the highlight-field is
sometimes filled, sometimes empty (the text-field ist stored as a
field too, as we had to do it before 0.14.0). if i search for
different words and find the same documents for some words the
highlight is there, for some words not (in the same document we find).

any ideas why?

Thanks a lot audn keep up with the great work!
MAX


(Lukáš Vlček) #4

Max,

how do you analyze the HTML source text and how do you store it in ES? Are
you sure you removed all html tags before highlighting takes action?

Regards,
Lukas

On Fri, Jan 21, 2011 at 11:06 AM, max max@kossatz.com wrote:

hi,

thank you for the fast response!

we did a "./gradlew clean release" and installed this 0.15.0-SNAPSHOT
and still have the same problem. sometimes the highlight is there,
sometimes not, any ideas why?

thank you very much,
MAX

On Jan 18, 7:26 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Heya,

Its a bug fixed in master / 0.14 branch.

-shay.banon

On Tuesday, January 18, 2011 at 2:48 PM, max wrote:

Hi,

we are using elasticsearch for indexing a lot of html-documents (right
now around 90k, indexsize is around 2GB). everything is great, we love
elasticsearch!

but we noticed one thing we cannot understand. we use the following
search:
{
"from" : 0, "size" : 10,
"sort" : [
{"time" : "desc"}
],

"query" : {
"filtered" : {
"query" : {
"query_string" : {
"fields" :
["title","text"],
"query" :
"",
"default_operator" :
"AND"
}
},
"filter" : {
"and" : [
{
"term" :
{"user" : ""}
}
]
}
}
},
"highlight" : {
"pre_tags" : [""],
"post_tags" : ["
"],
"fields" : {
"text" : {"fragment_size" : 100,
"number_of_fragments" : 1}
}
}
}

the search finds the right documents, but the highlight-field is
sometimes filled, sometimes empty (the text-field ist stored as a
field too, as we had to do it before 0.14.0). if i search for
different words and find the same documents for some words the
highlight is there, for some words not (in the same document we find).

any ideas why?

Thanks a lot audn keep up with the great work!
MAX


(max) #5

yes, all html-tags removed.
the "funny" thing is: for the same document depending on the
searchterm the highlight is or is not in the search-result. the
document itself is always found.

MAX

On Jan 21, 11:10 am, Lukáš Vlček lukas.vl...@gmail.com wrote:

Max,

how do you analyze the HTML source text and how do you store it in ES? Are
you sure you removed all html tags before highlighting takes action?

Regards,
Lukas

On Fri, Jan 21, 2011 at 11:06 AM, max m...@kossatz.com wrote:

hi,

thank you for the fast response!

we did a "./gradlew clean release" and installed this 0.15.0-SNAPSHOT
and still have the same problem. sometimes the highlight is there,
sometimes not, any ideas why?

thank you very much,
MAX

On Jan 18, 7:26 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Heya,

Its a bug fixed in master / 0.14 branch.

-shay.banon

On Tuesday, January 18, 2011 at 2:48 PM, max wrote:

Hi,

we are using elasticsearch for indexing a lot of html-documents (right
now around 90k, indexsize is around 2GB). everything is great, we love
elasticsearch!

but we noticed one thing we cannot understand. we use the following
search:
{
"from" : 0, "size" : 10,
"sort" : [
{"time" : "desc"}
],

"query" : {
"filtered" : {
"query" : {
"query_string" : {
"fields" :
["title","text"],
"query" :
"",
"default_operator" :
"AND"
}
},
"filter" : {
"and" : [
{
"term" :
{"user" : ""}
}
]
}
}
},
"highlight" : {
"pre_tags" : [""],
"post_tags" : ["
"],
"fields" : {
"text" : {"fragment_size" : 100,
"number_of_fragments" : 1}
}
}
}

the search finds the right documents, but the highlight-field is
sometimes filled, sometimes empty (the text-field ist stored as a
field too, as we had to do it before 0.14.0). if i search for
different words and find the same documents for some words the
highlight is there, for some words not (in the same document we find).

any ideas why?

Thanks a lot audn keep up with the great work!
MAX


(system) #6