I'd like to build a specific query to ES that I can't find currently, here
is what I would like to do:
The query would be a multi_match that keeps only the documents that have
all the terms in the query (the terms can be spread across several fields).
The default multi_match gives me documents that match at least one term
of the query (too much documents), and if I add "boolean": "and" it gives
me only documents that have all the terms of the query in one particular
field (too strict).
As a solution, I was thinking about using the analyzer of ES to get the
list of tokens, then do an "AND" query of multi_match queries for one
token in all the field, but it introduces a big overhead as there is two
API calls to ES.
Could you give me any pointers on how I could build such a query ?
Thanks,
Thibaut
I'd probably just aggregate all the fields you are interested in into 1
field and then match/AND on that 1 field. You probably can do a copy_to to
accomplish the aggregation to a single field:
Thank you! I haven't tried it yet, but reading the documentation makes me
understand that it will solve my needs.
Is it possible to keep the boosting applied to the individual fields when
computing the score ?
Should I keep the original query and add the match/AND you're talking
about, as a query filter, on the field created by the copy_to ?
On Tuesday, March 4, 2014 11:52:26 PM UTC+1, Binh Ly wrote:
I'd probably just aggregate all the fields you are interested in into 1
field and then match/AND on that 1 field. You probably can do a copy_to to
accomplish the aggregation to a single field:
Is it possible to keep the boosting applied to the individual fields when
computing the score ?
No. Field-level index time boosts will not be preserved with copy_to.
Coming very soon in 1.1.0 is the cross_fields type of multi_match query,
which is designed to solve exactly the problem you are dealing with. Have
a look at the docs:
Thanks, I actually came across this part of the documentation a few hours
ago, and this sentence is exactly what I was looking for: "It first
analyzes the query string into individual terms, then looks for each term
in any of the fields, as though they were one big field.".
I can't wait to upgrade to 1.1.0 then !
On Wednesday, March 5, 2014 1:52:31 PM UTC+1, Clinton Gormley wrote:
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.