The most relevant result does not seem to be the most relevant


(mfeingold) #1

I have a list of business names/addresses I run a query against. I assume
that the query should give me a list of results with the most relevant on
the top. But the result I see on the top does not seem to me to be the most
relevant. In particular the results N2 and N3 have much more in common with
the search string.

The query is here https://gist.github.com/3881531 the result of the
query (including the explain) is here https://gist.github.com/3881449 and
the mapping is here https://gist.github.com/3881560

Any help is greatly appreciated

--


(simonw-2) #2

hey man,

I looked briefly at your query and the mapping and I think this a very very
uncommon type of query you are running. Given you query including phrases,
multiple fuzzy searches and your mapping doing ngrams on multiple fields
with mingram = 1 and maxgram 50 (WOW!!!! crazy!) I don't thing you can
expect any reasonable results from any search engine on this planet. I am
happy to help but I think we should start with discussing the problem you
wanna solve rather than debugging this crazy query, no offense but I
honestly thing that is hopeless.

Can you explain what you are trying to do? I guess from what I see that you
want to be able to optimize for recall rather than precision and want to do
something like an "already spellchecked query"?! would be good if you could
explain it on a high level.

simon

On Friday, October 12, 2012 11:21:22 PM UTC+2, Michael Feingold wrote:

I have a list of business names/addresses I run a query against. I assume
that the query should give me a list of results with the most relevant on
the top. But the result I see on the top does not seem to me to be the most
relevant. In particular the results N2 and N3 have much more in common with
the search string.

The query is here https://gist.github.com/3881531 the result of the
query (including the explain) is here https://gist.github.com/3881449 and
the mapping is here https://gist.github.com/3881560

Any help is greatly appreciated

--


(mfeingold) #3

Thank you for your reply. I realize that I posted more 'code' than any
regular person would care to read. My apologies

Here is what I am trying to do:

I have a list of providers - whatever it is. For every one of them I have
their name and address as well as a set of ids - there several types of
them.
I need to be able to look up providers based on these values. The search
form has a single field where I expect the user to type whatever info the
user has - any combination of words (or parts of thereof) as well as ids.

My thinking was that I would index both text part and ids for exact match
(with a boost) as well as partial - ngrams for wildcards. Probably the
exact match is redundant, but I wanted to be able to boost the exact match.
The text partial I also indexed using phonetic filter. Max gram 50 is
probably an overkill but I really did not see any negative impact,

Now the query combines all of this together plus some fuzziness to help
with spelling errors.

The funny thing is that the query does what I want it to do. Except for
when I've shown it to a customer the result on the top of the list was
something I did not expect - see the attachments. From the first glance the
result on top was not just the most relevant, it was not relevant at all.

Other than this I am getting what I want, but you made me wonder - did I
overthink this? is there a better way to do what I need?

On Saturday, October 13, 2012 9:15:32 AM UTC-5, simonw wrote:

hey man,

I looked briefly at your query and the mapping and I think this a very
very uncommon type of query you are running. Given you query including
phrases, multiple fuzzy searches and your mapping doing ngrams on multiple
fields with mingram = 1 and maxgram 50 (WOW!!!! crazy!) I don't thing you
can expect any reasonable results from any search engine on this planet. I
am happy to help but I think we should start with discussing the problem
you wanna solve rather than debugging this crazy query, no offense but I
honestly thing that is hopeless.

Can you explain what you are trying to do? I guess from what I see that
you want to be able to optimize for recall rather than precision and want
to do something like an "already spellchecked query"?! would be good if you
could explain it on a high level.

simon

On Friday, October 12, 2012 11:21:22 PM UTC+2, Michael Feingold wrote:

I have a list of business names/addresses I run a query against. I assume
that the query should give me a list of results with the most relevant on
the top. But the result I see on the top does not seem to me to be the most
relevant. In particular the results N2 and N3 have much more in common with
the search string.

The query is here https://gist.github.com/3881531 the result of the
query (including the explain) is here https://gist.github.com/3881449 and
the mapping is here https://gist.github.com/3881560

Any help is greatly appreciated

--


(system) #4