I would like to have an opinion about custom query parsing, I have
read the documentation but couldn't find a good fit for my
requirements.
In my use case I plan to use two query parsers, one basic and one
advanced. The advanced one would parse queries like query_string does,
but I would like to disable fuzzy and wildcard for example. The basic
one would just analyze the tokens and use a boolean with "must" to
all, in three fields with different boosts, and a dismax for the three
fields.
Right now I don't find a way to implement any of my use cases in a
nice way. For the basic one I thought in calling the ES analyzer, get
the tokens, build the booleans and dismax queries, and finally call ES
again (two calls, and some hand coding query building in client code).
For the advanced case I could remove the characters used by
QueryParser to interpret fuzzy and wildcard, like *,?,~ (yes ~ would
suffer in the phrase slop case too, so the number after the slop would
stay meaningless, so it's not very good at all), but I hope something
tidier is possible.
I think maybe if I could add a custom query parser to ES it would
solve both cases in ES side, without calling the analyzer first
incurring in two network calls. Another nice thing would be to expose
parts of the current query_string functionality, eg:
"query_string" : {
"fields" : ["content", "name^5"],
"query" : "this AND that OR thus",
"use_dis_max" : true
}
In the example "fields" and "use_dis_max" would fit my case if I could
change the way "query" gets parsed, right now if I implement a parser
I would have to create similar code to send the query to all the
fields and join them in a dismax query. I mean like having a way to
build the parsing in pluggable components that can be configured in
the yml and called dynamically at query time.
So, my questions are:
is there a way to reuse somehow part of the query parser like the
example I pasted?
how can I register a custom query parser, and how can I call it from
the client API?
You can implement your own extension to Lucene QueryParser to disable fuzzy searches (actually, extends ES own extension of Lucene query parser). There is some initial support for custom query parsers implementations, though not properly tested yet (thus not published as a feature).
-shay.banon
On Tuesday, December 7, 2010 at 4:12 AM, Sebastian wrote:
Hi all,
I would like to have an opinion about custom query parsing, I have
read the documentation but couldn't find a good fit for my
requirements.
In my use case I plan to use two query parsers, one basic and one
advanced. The advanced one would parse queries like query_string does,
but I would like to disable fuzzy and wildcard for example. The basic
one would just analyze the tokens and use a boolean with "must" to
all, in three fields with different boosts, and a dismax for the three
fields.
Right now I don't find a way to implement any of my use cases in a
nice way. For the basic one I thought in calling the ES analyzer, get
the tokens, build the booleans and dismax queries, and finally call ES
again (two calls, and some hand coding query building in client code).
For the advanced case I could remove the characters used by
QueryParser to interpret fuzzy and wildcard, like *,?,~ (yes ~ would
suffer in the phrase slop case too, so the number after the slop would
stay meaningless, so it's not very good at all), but I hope something
tidier is possible.
I think maybe if I could add a custom query parser to ES it would
solve both cases in ES side, without calling the analyzer first
incurring in two network calls. Another nice thing would be to expose
parts of the current query_string functionality, eg:
"query_string" : {
"fields" : ["content", "name^5"],
"query" : "this AND that OR thus",
"use_dis_max" : true
}
In the example "fields" and "use_dis_max" would fit my case if I could
change the way "query" gets parsed, right now if I implement a parser
I would have to create similar code to send the query to all the
fields and join them in a dismax query. I mean like having a way to
build the parsing in pluggable components that can be configured in
the yml and called dynamically at query time.
So, my questions are:
is there a way to reuse somehow part of the query parser like the
example I pasted?
how can I register a custom query parser, and how can I call it from
the client API?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.