How do I help the users understand some unexpected search hits (Or how can I do "highlighting" on _all)

mooky · September 4, 2014, 5:41pm

I am indexing some entities that have up to 140 fields in the resultant
document - ie lots.
I am providing a simple/powerful google-style search of such entities using
the _all field - however, to make the user's life easier, we do prefix
searches.
(e.g. rather than the user having to type "johannesburg" or "aluminium" -
they can just type "joh" or "alu").

We display the results in a grid (with number of columns much less than
140!)

The users are new to this kind of search, and while they appreciate the
many benefits, they are sometimes confused by hits they don't expect.
E.g. they may search for johannesburg, expecting to get a hit on the
location - but get some odd hits because someone has put "johannesburg" in
a comment for something whose location is not johannesburg - and this is
compounded by the fact that they can't necessarily see why they got a
particular hit (because we show less than 140 columns - and some things
like comments are unsuitable to show in a grid.

In my experience its a bit of a common problem - you tend to want to show
the user the fields they can search on - but in reality, there are always
more fields that you want to search on than you want to display (esp as
columns).

The question is how to assist the user to see why something matched.

The problem is we are searching on _all so traditional highlighting doesn't
(and probably will never) help.

My question is are there some other tricks that anyone can suggest that
will help the user understand why they got unexpected hits?

E.g. One of my initial thoughts is that the nature of prefix search means
they might get more false-positives than expected simply because they
haven't typed enough characters. e.g. "joh" will get all items located in
"Johannesburg", but also get all items created by "John". My thought was
that maybe just showing (in a tooltip) the matching term might be of some
help - ie if the user sees "John", they know that simply typing one more
character - ie "joha" will eliminate a raft of false-positives.

Thoughts?

Cheers...

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/dc7bd363-2190-40fc-9e98-f37e8552d33c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

nik9000 · September 4, 2014, 6:27pm

On Thu, Sep 4, 2014 at 1:41 PM, mooky nick.minutello@gmail.com wrote:

I am indexing some entities that have up to 140 fields in the resultant
document - ie lots.
I am providing a simple/powerful google-style search of such entities
using the _all field - however, to make the user's life easier, we do
prefix searches.
(e.g. rather than the user having to type "johannesburg" or "aluminium" -
they can just type "joh" or "alu").

We display the results in a grid (with number of columns much less than
140!)

The users are new to this kind of search, and while they appreciate the
many benefits, they are sometimes confused by hits they don't expect.
E.g. they may search for johannesburg, expecting to get a hit on the
location - but get some odd hits because someone has put "johannesburg" in
a comment for something whose location is not johannesburg - and this is
compounded by the fact that they can't necessarily see why they got a
particular hit (because we show less than 140 columns - and some things
like comments are unsuitable to show in a grid.

In my experience its a bit of a common problem - you tend to want to show
the user the fields they can search on - but in reality, there are always
more fields that you want to search on than you want to display (esp as
columns).

The question is how to assist the user to see why something matched.

The problem is we are searching on _all so traditional highlighting
doesn't (and probably will never) help.

My question is are there some other tricks that anyone can suggest that
will help the user understand why they got unexpected hits?

E.g. One of my initial thoughts is that the nature of prefix search means
they might get more false-positives than expected simply because they
haven't typed enough characters. e.g. "joh" will get all items located in
"Johannesburg", but also get all items created by "John". My thought was
that maybe just showing (in a tooltip) the matching term might be of some
help - ie if the user sees "John", they know that simply typing one more
character - ie "joha" will eliminate a raft of false-positives.

Thoughts?

Cheers...

I think the problem is pretty hard. We have about 10 fields and use
the experimental
highlighter https://github.com/wikimedia/search-highlighterto highlight
in "chains" using skip_if_last_matched. You could try that. It might not
be fast enough, but it'd help, I think.

Nik

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd3DfXjXUsvz_HooyBEN%2BROyEZVOpYh_Rp6HK7r1k-MxqQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

mooky · September 8, 2014, 9:16am

I have looked at doing highlighting on _all.
I set store: true, and I am getting results.
I expect the contents of _all to be gobbledigook - so I am limiting the
fragments size to zero, so I just get the highlighted word. (I am
setting the pre/post tags to empty string).
What I am not expecting, and am getting, are sometimes multiple words in
the one fragment: e.g.
For prefix search of "2013", I am getting the following highlights
1: "2013"
2: "2013-10-01"
3: "2013 0.0000"

I expect all of them except the last one. Why is 0.0000 in there since
there is a space?
Similarly, for my prefix search of "ya", I get the following highlights:
1: "Yammin"
2: "Yammin 0.0000"

Is (2) expected? Is there a buggette?

Cheers...

On Thursday, 4 September 2014 18:41:37 UTC+1, mooky wrote:

I am indexing some entities that have up to 140 fields in the resultant
document - ie lots.
I am providing a simple/powerful google-style search of such entities
using the _all field - however, to make the user's life easier, we do
prefix searches.
(e.g. rather than the user having to type "johannesburg" or "aluminium" -
they can just type "joh" or "alu").

We display the results in a grid (with number of columns much less than
140!)

The users are new to this kind of search, and while they appreciate the
many benefits, they are sometimes confused by hits they don't expect.
E.g. they may search for johannesburg, expecting to get a hit on the
location - but get some odd hits because someone has put "johannesburg" in
a comment for something whose location is not johannesburg - and this is
compounded by the fact that they can't necessarily see why they got a
particular hit (because we show less than 140 columns - and some things
like comments are unsuitable to show in a grid.

In my experience its a bit of a common problem - you tend to want to show
the user the fields they can search on - but in reality, there are always
more fields that you want to search on than you want to display (esp as
columns).

The question is how to assist the user to see why something matched.

The problem is we are searching on _all so traditional highlighting
doesn't (and probably will never) help.

My question is are there some other tricks that anyone can suggest that
will help the user understand why they got unexpected hits?

E.g. One of my initial thoughts is that the nature of prefix search means
they might get more false-positives than expected simply because they
haven't typed enough characters. e.g. "joh" will get all items located in
"Johannesburg", but also get all items created by "John". My thought was
that maybe just showing (in a tooltip) the matching term might be of some
help - ie if the user sees "John", they know that simply typing one more
character - ie "joha" will eliminate a raft of false-positives.

Thoughts?

Cheers...

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/766dab13-a3d6-43e6-b7ca-8b7182fac25d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

mooky · September 9, 2014, 9:15am

Is (2) expected? Is there a buggette?

Anyone familiar with highlighting have any insight?

On Monday, 8 September 2014 10:16:02 UTC+1, mooky wrote:

I have looked at doing highlighting on _all.
I set store: true, and I am getting results.
I expect the contents of _all to be gobbledigook - so I am limiting the
fragments size to zero, so I just get the highlighted word. (I am
setting the pre/post tags to empty string).
What I am not expecting, and am getting, are sometimes multiple words in
the one fragment: e.g.
For prefix search of "2013", I am getting the following highlights
1: "2013"
2: "2013-10-01"
3: "2013 0.0000"

I expect all of them except the last one. Why is 0.0000 in there since
there is a space?
Similarly, for my prefix search of "ya", I get the following highlights:
1: "Yammin"
2: "Yammin 0.0000"

Is (2) expected? Is there a buggette?

Cheers...

On Thursday, 4 September 2014 18:41:37 UTC+1, mooky wrote:

I am indexing some entities that have up to 140 fields in the resultant
document - ie lots.
I am providing a simple/powerful google-style search of such entities
using the _all field - however, to make the user's life easier, we do
prefix searches.
(e.g. rather than the user having to type "johannesburg" or "aluminium" -
they can just type "joh" or "alu").

We display the results in a grid (with number of columns much less than
140!)

The users are new to this kind of search, and while they appreciate the
many benefits, they are sometimes confused by hits they don't expect.
E.g. they may search for johannesburg, expecting to get a hit on the
location - but get some odd hits because someone has put "johannesburg" in
a comment for something whose location is not johannesburg - and this is
compounded by the fact that they can't necessarily see why they got a
particular hit (because we show less than 140 columns - and some things
like comments are unsuitable to show in a grid.

In my experience its a bit of a common problem - you tend to want to show
the user the fields they can search on - but in reality, there are always
more fields that you want to search on than you want to display (esp as
columns).

The question is how to assist the user to see why something matched.

The problem is we are searching on _all so traditional highlighting
doesn't (and probably will never) help.

My question is are there some other tricks that anyone can suggest that
will help the user understand why they got unexpected hits?

E.g. One of my initial thoughts is that the nature of prefix search means
they might get more false-positives than expected simply because they
haven't typed enough characters. e.g. "joh" will get all items located in
"Johannesburg", but also get all items created by "John". My thought was
that maybe just showing (in a tooltip) the matching term might be of some
help - ie if the user sees "John", they know that simply typing one more
character - ie "joha" will eliminate a raft of false-positives.

Thoughts?

Cheers...

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d117171e-09d4-42f7-b128-3e78519a3352%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

nik9000 · September 9, 2014, 11:56am

I imaging its caused by your analysis configuration. User the analyze api
and check what is output for all those terms.
On Sep 9, 2014 5:15 AM, "mooky" nick.minutello@gmail.com wrote:

Is (2) expected? Is there a buggette?

Anyone familiar with highlighting have any insight?

On Monday, 8 September 2014 10:16:02 UTC+1, mooky wrote:

I have looked at doing highlighting on _all.
I set store: true, and I am getting results.
I expect the contents of _all to be gobbledigook - so I am limiting the
fragments size to zero, so I just get the highlighted word. (I am
setting the pre/post tags to empty string).
What I am not expecting, and am getting, are sometimes multiple words in
the one fragment: e.g.
For prefix search of "2013", I am getting the following highlights
1: "2013"
2: "2013-10-01"
3: "2013 0.0000"

I expect all of them except the last one. Why is 0.0000 in there since
there is a space?
Similarly, for my prefix search of "ya", I get the following highlights:
1: "Yammin"
2: "Yammin 0.0000"

Is (2) expected? Is there a buggette?

Cheers...

On Thursday, 4 September 2014 18:41:37 UTC+1, mooky wrote:

I am indexing some entities that have up to 140 fields in the resultant
document - ie lots.
I am providing a simple/powerful google-style search of such entities
using the _all field - however, to make the user's life easier, we do
prefix searches.
(e.g. rather than the user having to type "johannesburg" or "aluminium"

they can just type "joh" or "alu").

We display the results in a grid (with number of columns much less than
140!)

The users are new to this kind of search, and while they appreciate the
many benefits, they are sometimes confused by hits they don't expect.
E.g. they may search for johannesburg, expecting to get a hit on the
location - but get some odd hits because someone has put "johannesburg" in
a comment for something whose location is not johannesburg - and this is
compounded by the fact that they can't necessarily see why they got a
particular hit (because we show less than 140 columns - and some things
like comments are unsuitable to show in a grid.

In my experience its a bit of a common problem - you tend to want to
show the user the fields they can search on - but in reality, there are
always more fields that you want to search on than you want to display
(esp as columns).

The question is how to assist the user to see why something matched.

The problem is we are searching on _all so traditional highlighting
doesn't (and probably will never) help.

My question is are there some other tricks that anyone can suggest that
will help the user understand why they got unexpected hits?

E.g. One of my initial thoughts is that the nature of prefix search
means they might get more false-positives than expected simply because they
haven't typed enough characters. e.g. "joh" will get all items located in
"Johannesburg", but also get all items created by "John". My thought was
that maybe just showing (in a tooltip) the matching term might be of some
help - ie if the user sees "John", they know that simply typing one more
character - ie "joha" will eliminate a raft of false-positives.

Thoughts?

Cheers...

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/d117171e-09d4-42f7-b128-3e78519a3352%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/d117171e-09d4-42f7-b128-3e78519a3352%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd2yOr6TPpT3%2B5_kQnBAaKjU0dd0j4Sk_JLqa9ouj%3DN-bw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

mooky · September 25, 2014, 9:37am

Having checked the analysis - it does not look like it is the suspect...

On Tuesday, 9 September 2014 12:56:28 UTC+1, Nikolas Everett wrote:

I imaging its caused by your analysis configuration. User the analyze api
and check what is output for all those terms.
On Sep 9, 2014 5:15 AM, "mooky" <nick.mi...@gmail.com <javascript:>>
wrote:

Is (2) expected? Is there a buggette?

Anyone familiar with highlighting have any insight?

On Monday, 8 September 2014 10:16:02 UTC+1, mooky wrote:

I have looked at doing highlighting on _all.
I set store: true, and I am getting results.
I expect the contents of _all to be gobbledigook - so I am limiting the
fragments size to zero, so I just get the highlighted word. (I am
setting the pre/post tags to empty string).
What I am not expecting, and am getting, are sometimes multiple words in
the one fragment: e.g.
For prefix search of "2013", I am getting the following highlights
1: "2013"
2: "2013-10-01"
3: "2013 0.0000"

I expect all of them except the last one. Why is 0.0000 in there since
there is a space?
Similarly, for my prefix search of "ya", I get the following highlights:
1: "Yammin"
2: "Yammin 0.0000"

Is (2) expected? Is there a buggette?

Cheers...

On Thursday, 4 September 2014 18:41:37 UTC+1, mooky wrote:

I am indexing some entities that have up to 140 fields in the resultant
document - ie lots.
I am providing a simple/powerful google-style search of such entities
using the _all field - however, to make the user's life easier, we do
prefix searches.
(e.g. rather than the user having to type "johannesburg" or "aluminium"

they can just type "joh" or "alu").

We display the results in a grid (with number of columns much less than
140!)

The users are new to this kind of search, and while they appreciate the
many benefits, they are sometimes confused by hits they don't expect.
E.g. they may search for johannesburg, expecting to get a hit on the
location - but get some odd hits because someone has put "johannesburg" in
a comment for something whose location is not johannesburg - and this is
compounded by the fact that they can't necessarily see why they got a
particular hit (because we show less than 140 columns - and some things
like comments are unsuitable to show in a grid.

In my experience its a bit of a common problem - you tend to want to
show the user the fields they can search on - but in reality, there are
always more fields that you want to search on than you want to display
(esp as columns).

The question is how to assist the user to see why something matched.

The problem is we are searching on _all so traditional highlighting
doesn't (and probably will never) help.

My question is are there some other tricks that anyone can suggest that
will help the user understand why they got unexpected hits?

E.g. One of my initial thoughts is that the nature of prefix search
means they might get more false-positives than expected simply because they
haven't typed enough characters. e.g. "joh" will get all items located in
"Johannesburg", but also get all items created by "John". My thought was
that maybe just showing (in a tooltip) the matching term might be of some
help - ie if the user sees "John", they know that simply typing one more
character - ie "joha" will eliminate a raft of false-positives.

Thoughts?

Cheers...

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/d117171e-09d4-42f7-b128-3e78519a3352%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/d117171e-09d4-42f7-b128-3e78519a3352%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0fd7bc0e-9345-4e7a-bfb8-fcd0a12f9630%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Highlights query - returns too long a snippet Elasticsearch	1	802	July 5, 2017
Highlighted query has hits but misses highlighted fragments (for some documents) Elasticsearch	9	683	July 6, 2017
How can I tell what matched? Elasticsearch	7	459	July 6, 2017
Highlighting highlights all words in all fields when searching for field:* Elasticsearch	6	845	July 6, 2017
[ANN] Elasticsearch experimental highlighter Elasticsearch	4	815	July 6, 2017

How do I help the users understand some unexpected search hits (Or how can I do "highlighting" on _all)

Related topics