Currently I am working on an application which allows users to search on
multiple different fields using our own query language. This sometimes
requires us to combine a search for analyzed text with another query using
an "and" operator. For example, someone could search for:
some text AND color:green
We normally do this by combining a match query with another query for the
color using a boolean query. This is fine, but if "some text" consists of
only stop words, then we will get no results. For example, if someone
searches for:
is AND color:green
then we will get no results. While we can't do anything useful with a term
that only contains stop words, we would rather turn it into a match_all
rather than a match_none. Currently, a match query with only stop words
yields a Lucene BooleanQuery with no terms, which will never match any
documents. In an ideal world for us, we would like a query that when it
receives no tokens from the analyzer yields a Lucene MatchAllDocs query.
This can for example be achieved by running a field query with the
following query text:
+({possible stopword text here}) *
Unfortunately, this seems like somewhat of a hack and I'd rather not
construct Lucene query strings that are just going to be immediately parsed
if I can avoid it. I was wondering if there is some better way to get
similar semantics by directly using ElasticSearch queries. It's not a super
big deal but it would be nice to be able to implement behavior similar to
what Lucene query strings already do, but with different semantics for the
application.
So does anyone have any recommendations for making such a query without
using a Lucene query string? Or is that the best way to do this?
This problem comes up quite a lot but unfortunately there isn't many
options at this moment about what you can do. MatchQuery (the Query
produced by the 'match' query type) currently has hardcoded behaviour for
what to do when analysis strips all the terms from the input.
On Thursday, November 22, 2012 5:37:48 AM UTC+13, John Daniels wrote:
Hi all!
Currently I am working on an application which allows users to search on
multiple different fields using our own query language. This sometimes
requires us to combine a search for analyzed text with another query using
an "and" operator. For example, someone could search for:
some text AND color:green
We normally do this by combining a match query with another query for the
color using a boolean query. This is fine, but if "some text" consists of
only stop words, then we will get no results. For example, if someone
searches for:
is AND color:green
then we will get no results. While we can't do anything useful with a term
that only contains stop words, we would rather turn it into a match_all
rather than a match_none. Currently, a match query with only stop words
yields a Lucene BooleanQuery with no terms, which will never match any
documents. In an ideal world for us, we would like a query that when it
receives no tokens from the analyzer yields a Lucene MatchAllDocs query.
This can for example be achieved by running a field query with the
following query text:
+({possible stopword text here}) *
Unfortunately, this seems like somewhat of a hack and I'd rather not
construct Lucene query strings that are just going to be immediately parsed
if I can avoid it. I was wondering if there is some better way to get
similar semantics by directly using Elasticsearch queries. It's not a super
big deal but it would be nice to be able to implement behavior similar to
what Lucene query strings already do, but with different semantics for the
application.
So does anyone have any recommendations for making such a query without
using a Lucene query string? Or is that the best way to do this?
Hi
i have a question why ZeroTermsQuery is added only in MatchQuery why it is not added in QueryStringQuery or in MultiMatchQuery.
i am using MultiMatchQuery and i have the same problem. i can change it to MatchQuery because i have to add multiparty fields .or is there a way how i can add multiparty fields with MatchQuery
thanks
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.