Highlighting highlights all words in all fields when searching for field:*


(Mark Waddle-2) #1

When I execute a search with a query like "field:*", and specify a number
of fields to be highlighted, it seems to highlight all words, except stop
words, for all of those fields. It doesn't matter whether the field
searched on is in the highlighted field set or not. This seems incorrect to
me. It seems like only the words in the field specified in the query should
be highlighted, and only if that field is included in the highlighted field
set of course. Am I doing something wrong? Are my expectations wrong? Is
this a bug?

Thanks,
Mark

--


(Lukáš Vlček) #2

Hi,

do you think you can prepare recreation script?

Regards,
Lukas
Dne 18.8.2012 8:06 "Mark Waddle" mark@redfoleo.com napsal(a):

When I execute a search with a query like "field:*", and specify a number
of fields to be highlighted, it seems to highlight all words, except stop
words, for all of those fields. It doesn't matter whether the field
searched on is in the highlighted field set or not. This seems incorrect to
me. It seems like only the words in the field specified in the query should
be highlighted, and only if that field is included in the highlighted field
set of course. Am I doing something wrong? Are my expectations wrong? Is
this a bug?

Thanks,
Mark

--

--


(Mark Waddle-2) #3

#!/bin/bash
curl -XDELETE 'http://localhost:9200/highlight'
echo
curl -XPOST 'http://localhost:9200/highlight'
echo
curl -XPUT -d '{
"doc" : {
"_source" : { "enabled" : false },
"_all" : { "enabled" : true },
"_timestamp" : { "enabled" : true, "store" : "yes" },
"properties" : {
"text1" : {
"type" : "string",
"store" : "yes"
},
"text2" : {
"type" : "string",
"store" : "yes"
}
}
}
}' http://localhost:9200/highlight/doc/_mapping
echo
curl -XPOST -d '{ "text1": "doc1 and text1", "text2": "doc1 and text2" }'
http://localhost:9200/highlight/doc
echo
curl -XPOST -d '{ "text1": "doc2 and text1", "text2": "doc2 and text2" }'
http://localhost:9200/highlight/doc
echo
curl -XPOST http://localhost:9200/highlight/_refresh
echo
curl -d '{ "query" : { "query_string" : { "query" : "text1:*" } }, "fields"
: ["text1", "text2"], "highlight" : { "fields" : { "text1" : {}, "text2" :
{} } } }' http://localhost:9200/highlight/_search?pretty=true
echo

On Friday, August 17, 2012 11:10:28 PM UTC-7, Lukáš Vlček wrote:

Hi,

do you think you can prepare recreation script?

Regards,
Lukas
Dne 18.8.2012 8:06 "Mark Waddle" <ma...@redfoleo.com <javascript:>>
napsal(a):

When I execute a search with a query like "field:*", and specify a number
of fields to be highlighted, it seems to highlight all words, except stop
words, for all of those fields. It doesn't matter whether the field
searched on is in the highlighted field set or not. This seems incorrect to
me. It seems like only the words in the field specified in the query should
be highlighted, and only if that field is included in the highlighted field
set of course. Am I doing something wrong? Are my expectations wrong? Is
this a bug?

Thanks,
Mark

--

--


(Mark Waddle-2) #4

Has anyone been able to reproduce this with the script?

On Friday, August 17, 2012 11:25:18 PM UTC-7, Mark Waddle wrote:

#!/bin/bash
curl -XDELETE 'http://localhost:9200/highlight'
echo
curl -XPOST 'http://localhost:9200/highlight'
echo
curl -XPUT -d '{
"doc" : {
"_source" : { "enabled" : false },
"_all" : { "enabled" : true },
"_timestamp" : { "enabled" : true, "store" : "yes" },
"properties" : {
"text1" : {
"type" : "string",
"store" : "yes"
},
"text2" : {
"type" : "string",
"store" : "yes"
}
}
}
}' http://localhost:9200/highlight/doc/_mapping
echo
curl -XPOST -d '{ "text1": "doc1 and text1", "text2": "doc1 and text2" }'
http://localhost:9200/highlight/doc
echo
curl -XPOST -d '{ "text1": "doc2 and text1", "text2": "doc2 and text2" }'
http://localhost:9200/highlight/doc
echo
curl -XPOST http://localhost:9200/highlight/_refresh
echo
curl -d '{ "query" : { "query_string" : { "query" : "text1:*" } },
"fields" : ["text1", "text2"], "highlight" : { "fields" : { "text1" : {},
"text2" : {} } } }' http://localhost:9200/highlight/_search?pretty=true
echo

On Friday, August 17, 2012 11:10:28 PM UTC-7, Lukáš Vlček wrote:

Hi,

do you think you can prepare recreation script?

Regards,
Lukas
Dne 18.8.2012 8:06 "Mark Waddle" ma...@redfoleo.com napsal(a):

When I execute a search with a query like "field:*", and specify a
number of fields to be highlighted, it seems to highlight all words, except
stop words, for all of those fields. It doesn't matter whether the field
searched on is in the highlighted field set or not. This seems incorrect to
me. It seems like only the words in the field specified in the query should
be highlighted, and only if that field is included in the highlighted field
set of course. Am I doing something wrong? Are my expectations wrong? Is
this a bug?

Thanks,
Mark

--

--


(Shaun Etherton) #5

Hi

It seems ok to me.

This is the result i get running your script. I'm using elasticsearch-0.19.4.

I'd say the query and the highlight are separate things and treating them separately is correct i.e.) the results are correct/expected.
I ran the script as it was & then ran the query again with the text2 field removed from the highlight section and it behaves as i'd expect.

HTH.

cheers

On Monday, 20 August 2012 at 7:57 AM, Mark Waddle wrote:

Has anyone been able to reproduce this with the script?

On Friday, August 17, 2012 11:25:18 PM UTC-7, Mark Waddle wrote:

#!/bin/bash
curl -XDELETE 'http://localhost:9200/highlight'
echo
curl -XPOST 'http://localhost:9200/highlight'
echo
curl -XPUT -d '{
"doc" : {
"_source" : { "enabled" : false },
"_all" : { "enabled" : true },
"_timestamp" : { "enabled" : true, "store" : "yes" },
"properties" : {
"text1" : {
"type" : "string",
"store" : "yes"
},
"text2" : {
"type" : "string",
"store" : "yes"
}
}
}
}' http://localhost:9200/highlight/doc/_mapping
echo
curl -XPOST -d '{ "text1": "doc1 and text1", "text2": "doc1 and text2" }' http://localhost:9200/highlight/doc
echo
curl -XPOST -d '{ "text1": "doc2 and text1", "text2": "doc2 and text2" }' http://localhost:9200/highlight/doc
echo
curl -XPOST http://localhost:9200/highlight/_refresh
echo
curl -d '{ "query" : { "query_string" : { "query" : "text1:*" } }, "fields" : ["text1", "text2"], "highlight" : { "fields" : { "text1" : {}, "text2" : {} } } }' http://localhost:9200/highlight/_search?pretty=true
echo

On Friday, August 17, 2012 11:10:28 PM UTC-7, Lukáš Vlček wrote:

Hi,
do you think you can prepare recreation script?
Regards,
Lukas
Dne 18.8.2012 8:06 "Mark Waddle" ma...@redfoleo.com napsal(a):

When I execute a search with a query like "field:*", and specify a number of fields to be highlighted, it seems to highlight all words, except stop words, for all of those fields. It doesn't matter whether the field searched on is in the highlighted field set or not. This seems incorrect to me. It seems like only the words in the field specified in the query should be highlighted, and only if that field is included in the highlighted field set of course. Am I doing something wrong? Are my expectations wrong? Is this a bug?

Thanks,
Mark

--

--


(Mark Waddle-2) #6

Thank you Shaun for trying it out. I am glad to see that you got the same
results as me, but our expectations are different. I expected that even
though I asked for highlighting on both term1 and term2, that it would only
highlight term1 because the query match was limited to term1 (term1:*).

However now that I think about it, this happens with any type of query,
whether it is a wildcard or not. For example, if I execute a query of
title:wireless, and request highlighting on the title and abstract field,
it will highlight the word wireless in both. It is just especially heinous
when doing a wildcard search because it highlights all terms in all
highlighted fields. I guess it is because the highlighter is not aware of
which field(s) the query actually hit on, and just highlights the search
term in all fields where highlighting is requested.

I think I will open an enhancement issue.

On Sunday, August 19, 2012 7:18:57 PM UTC-7, Shaun Etherton wrote:

Hi

It seems ok to me.

This is the result i get running your script. I'm using
elasticsearch-0.19.4.
https://gist.github.com/3399280

I'd say the query and the highlight are separate things and treating them
separately is correct i.e.) the results are correct/expected.
I ran the script as it was & then ran the query again with the text2 field
removed from the highlight section and it behaves as i'd expect.

HTH.

cheers

On Monday, 20 August 2012 at 7:57 AM, Mark Waddle wrote:

Has anyone been able to reproduce this with the script?

On Friday, August 17, 2012 11:25:18 PM UTC-7, Mark Waddle wrote:

#!/bin/bash
curl -XDELETE 'http://localhost:9200/highlight'
echo
curl -XPOST 'http://localhost:9200/highlight'
echo
curl -XPUT -d '{
"doc" : {
"_source" : { "enabled" : false },
"_all" : { "enabled" : true },
"_timestamp" : { "enabled" : true, "store" : "yes" },
"properties" : {
"text1" : {
"type" : "string",
"store" : "yes"
},
"text2" : {
"type" : "string",
"store" : "yes"
}
}
}
}' http://localhost:9200/highlight/doc/_mapping
echo
curl -XPOST -d '{ "text1": "doc1 and text1", "text2": "doc1 and text2"
}' http://localhost:9200/highlight/doc

echo
curl -XPOST -d '{ "text1": "doc2 and text1", "text2": "doc2 and text2"
}' http://localhost:9200/highlight/doc

echo
curl -XPOST http://localhost:9200/highlight/_refresh
echo
curl -d '{ "query" : { "query_string" : { "query" : "text1:*" } },
"fields" : ["text1", "text2"], "highlight" : { "fields" : { "text1" : {},
"text2" : {} } } }' http://localhost:9200/highlight/_search?pretty=true

echo

On Friday, August 17, 2012 11:10:28 PM UTC-7, Lukáš Vlček wrote:

Hi,
do you think you can prepare recreation script?
Regards,
Lukas
Dne 18.8.2012 8:06 "Mark Waddle" ma...@redfoleo.com napsal(a):

When I execute a search with a query like "field:*", and specify a
number of fields to be highlighted, it seems to highlight all words, except
stop words, for all of those fields. It doesn't matter whether the field
searched on is in the highlighted field set or not. This seems incorrect to
me. It seems like only the words in the field specified in the query should
be highlighted, and only if that field is included in the highlighted field
set of course. Am I doing something wrong? Are my expectations wrong? Is
this a bug?

Thanks,
Mark

--

--


(system) #7