Cross field extension to MultiMatch


(Taras Shkvarchuk) #1

I wrote a patch to MultiMatch query that provides more natural andprocessing when considering multiple fields.

Consider document with fields:

  • author: Beatles

Now if I take user's input and run a query to match my documents, it would
be natural to consider ether the dreaded _all field or a multi_match query
like:

multi_match:{"query":"Something Beatles", "fields":["title",
"description", "author"], "operator":"and"}

Which would get transformed into a boolean query such as:

(+title:something +title:beatles) (+description:something
+description:beatles) (+author:something +author: beatles)

There is no match for our document! From human input perspective often the
most natural way to AND multi-field search is to ensure each term is
matched somewhere across all fields such as:

+(title:something description:something author:something) +(title:beatles
description:beatles author:beatles)

My patch does exactly that and it also accounts for use of multiple
analyzers which may remove tokens from some fields (ex: The Beatles). If a
token is skipped by an analyzer it will be turned into a should requirement
on remaining fields instead of a must.

I am using facilities of match query for minimum should match as well as
fuzzy processing so a new match type felt natural.

multi_match:{"query":"Something Beatles", "fields":["title",
"description", "author"], "type":"across"}

You can see the patch
here: https://github.com/tarass/elasticsearch/commit/8f9fe6c51172f00901b42be621670dd6b2e211ee
I would like a little feedback if others find this useful and if adding it
to MultiMatch is the right approach vs writing a plugin. I will also be
adding more unit tests, but this passes with flying colors on our site.

-Taras

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Matt Weber) #2

No patch necessary, just use a query string query.

{"query":{"query_string":{"query":"something
beatles","fields":["title","description","author"],"default_operator":"AND"}}}

Thanks,
Matt Weber

On Tue, Apr 23, 2013 at 5:54 PM, Taras Shkvarchuk tarass@gmail.com wrote:

I wrote a patch to MultiMatch query that provides more natural and
processing when considering multiple fields.

Consider document with fields:

title: Something
description: featured on their 1969 album Abbey Road
author: Beatles

Now if I take user's input and run a query to match my documents, it would
be natural to consider ether the dreaded _all field or a multi_match query
like:

multi_match:{"query":"Something Beatles", "fields":["title", "description",
"author"], "operator":"and"}

Which would get transformed into a boolean query such as:

(+title:something +title:beatles) (+description:something
+description:beatles) (+author:something +author: beatles)

There is no match for our document! From human input perspective often the
most natural way to AND multi-field search is to ensure each term is matched
somewhere across all fields such as:

+(title:something description:something author:something) +(title:beatles
description:beatles author:beatles)

My patch does exactly that and it also accounts for use of multiple
analyzers which may remove tokens from some fields (ex: The Beatles). If a
token is skipped by an analyzer it will be turned into a should requirement
on remaining fields instead of a must.

I am using facilities of match query for minimum should match as well as
fuzzy processing so a new match type felt natural.

multi_match:{"query":"Something Beatles", "fields":["title", "description",
"author"], "type":"across"}

You can see the patch here:
https://github.com/tarass/elasticsearch/commit/8f9fe6c51172f00901b42be621670dd6b2e211ee
I would like a little feedback if others find this useful and if adding it
to MultiMatch is the right approach vs writing a plugin. I will also be
adding more unit tests, but this passes with flying colors on our site.

-Taras

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Taras Shkvarchuk) #3

That doesn't do quiet the right thing, but close at first glance.

  1. query_string can't be fed user input because it looks for markup inside
    the query and doesn't treat it purely as text
  2. Presence of stopwords and otherwise removed tokens with fields analyzed
    in different manner yields no results. "The Beatles" will not match if
    there is a field with stop word filter and one without.

So if you can think of a different approach, I'm all ears.

On Wednesday, April 24, 2013 9:34:56 AM UTC-7, Matt Weber wrote:

No patch necessary, just use a query string query.

{"query":{"query_string":{"query":"something
beatles","fields":["title","description","author"],"default_operator":"AND"}}}

Thanks,
Matt Weber

On Tue, Apr 23, 2013 at 5:54 PM, Taras Shkvarchuk <tar...@gmail.com<javascript:>>
wrote:

I wrote a patch to MultiMatch query that provides more natural and
processing when considering multiple fields.

Consider document with fields:

title: Something
description: featured on their 1969 album Abbey Road
author: Beatles

Now if I take user's input and run a query to match my documents, it
would
be natural to consider ether the dreaded _all field or a multi_match
query
like:

multi_match:{"query":"Something Beatles", "fields":["title",
"description",
"author"], "operator":"and"}

Which would get transformed into a boolean query such as:

(+title:something +title:beatles) (+description:something
+description:beatles) (+author:something +author: beatles)

There is no match for our document! From human input perspective often
the
most natural way to AND multi-field search is to ensure each term is
matched
somewhere across all fields such as:

+(title:something description:something author:something)
+(title:beatles
description:beatles author:beatles)

My patch does exactly that and it also accounts for use of multiple
analyzers which may remove tokens from some fields (ex: The Beatles). If
a
token is skipped by an analyzer it will be turned into a should
requirement
on remaining fields instead of a must.

I am using facilities of match query for minimum should match as well as
fuzzy processing so a new match type felt natural.

multi_match:{"query":"Something Beatles", "fields":["title",
"description",
"author"], "type":"across"}

You can see the patch here:

https://github.com/tarass/elasticsearch/commit/8f9fe6c51172f00901b42be621670dd6b2e211ee

I would like a little feedback if others find this useful and if adding
it
to MultiMatch is the right approach vs writing a plugin. I will also be
adding more unit tests, but this passes with flying colors on our site.

-Taras

--
You received this message because you are subscribed to the Google
Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Felix) #4

I'd love to have this feature!
Why don't you create an issue at elasticsearch with a pull request?


(system) #5