Leveraging the query parser


(Ivan Brusic) #1

Part of my system accepts strings in the Lucene syntax, which are either
single terms "123" or groups "(123 4 3412)".

With Lucene, I can use a QueryParser to parse a query string and it would
return either a TermQuery or a BooleanQuery. In ElasticSearch, the
QueryParser requires a QueryParseContext, which means it probably cannot be
used outside the context of ElasticSearch. Parsing on the client side also
allows me to check for potential errors in the string.

My current solution is to use Lucene's QueryParser and convert Lucene
Querys into their ElasticSearch equivalent. Does a better way exist using
straight ElasticSearch? Either a way to create a simple QueryParseContext
or a QueryBuilder that accepts a Lucene Query.

Cheers,

Ivan

--


(Mark Waddle) #2

The Query DSL supports Lucene queries via the "query_string" query. See
http://www.elasticsearch.org/guide/reference/query-dsl/query-string-query.html
.

To validate your queries I recommend using the Validate API, as documented
at http://www.elasticsearch.org/guide/reference/api/validate.html.

On Monday, October 8, 2012 11:30:00 AM UTC-7, Ivan Brusic wrote:

Part of my system accepts strings in the Lucene syntax, which are either
single terms "123" or groups "(123 4 3412)".

With Lucene, I can use a QueryParser to parse a query string and it would
return either a TermQuery or a BooleanQuery. In ElasticSearch, the
QueryParser requires a QueryParseContext, which means it probably cannot be
used outside the context of ElasticSearch. Parsing on the client side also
allows me to check for potential errors in the string.

My current solution is to use Lucene's QueryParser and convert Lucene
Querys into their ElasticSearch equivalent. Does a better way exist using
straight ElasticSearch? Either a way to create a simple QueryParseContext
or a QueryBuilder that accepts a Lucene Query.

Cheers,

Ivan

--


(Ivan Brusic) #3

I am very aware of the query_string query. I am looking for behavior
analogous to Lucene's QueryParser.

My workflow is not suited for the Validate API. Only a portion of the query
(actually, the filter) is derived from string in question. If the string is
invalid, it is dropped from the query. The Validate API is all-or-nothing,
it does not easily identify the offending subclause. Besides, it requires a
network hop.

Cheers,

Ivan

On Mon, Oct 8, 2012 at 2:53 PM, Mark Waddle mark@markwaddle.com wrote:

The Query DSL supports Lucene queries via the "query_string" query. See
http://www.elasticsearch.org/guide/reference/query-dsl/query-string-query.html
.

To validate your queries I recommend using the Validate API, as documented
at http://www.elasticsearch.org/guide/reference/api/validate.html.

On Monday, October 8, 2012 11:30:00 AM UTC-7, Ivan Brusic wrote:

Part of my system accepts strings in the Lucene syntax, which are either
single terms "123" or groups "(123 4 3412)".

With Lucene, I can use a QueryParser to parse a query string and it would
return either a TermQuery or a BooleanQuery. In ElasticSearch, the
QueryParser requires a QueryParseContext, which means it probably cannot be
used outside the context of ElasticSearch. Parsing on the client side also
allows me to check for potential errors in the string.

My current solution is to use Lucene's QueryParser and convert Lucene
Querys into their ElasticSearch equivalent. Does a better way exist using
straight ElasticSearch? Either a way to create a simple QueryParseContext
or a QueryBuilder that accepts a Lucene Query.

Cheers,

Ivan

--

--


(Chris Male) #4

Hi,

I'm a little lost as to what you're trying to do. Are you wanting to parse
your query on the client side for validation and then send it to ES to
execute?

On Tuesday, October 9, 2012 11:56:50 AM UTC+13, Ivan Brusic wrote:

I am very aware of the query_string query. I am looking for behavior
analogous to Lucene's QueryParser.

My workflow is not suited for the Validate API. Only a portion of the
query (actually, the filter) is derived from string in question. If the
string is invalid, it is dropped from the query. The Validate API is
all-or-nothing, it does not easily identify the offending subclause.
Besides, it requires a network hop.

Cheers,

Ivan

On Mon, Oct 8, 2012 at 2:53 PM, Mark Waddle <ma...@markwaddle.com<javascript:>

wrote:

The Query DSL supports Lucene queries via the "query_string" query. See
http://www.elasticsearch.org/guide/reference/query-dsl/query-string-query.html
.

To validate your queries I recommend using the Validate API, as
documented at
http://www.elasticsearch.org/guide/reference/api/validate.html.

On Monday, October 8, 2012 11:30:00 AM UTC-7, Ivan Brusic wrote:

Part of my system accepts strings in the Lucene syntax, which are either
single terms "123" or groups "(123 4 3412)".

With Lucene, I can use a QueryParser to parse a query string and it
would return either a TermQuery or a BooleanQuery. In ElasticSearch, the
QueryParser requires a QueryParseContext, which means it probably cannot be
used outside the context of ElasticSearch. Parsing on the client side also
allows me to check for potential errors in the string.

My current solution is to use Lucene's QueryParser and convert Lucene
Querys into their ElasticSearch equivalent. Does a better way exist using
straight ElasticSearch? Either a way to create a simple QueryParseContext
or a QueryBuilder that accepts a Lucene Query.

Cheers,

Ivan

--

--


(Ivan Brusic) #5

Looking to create queries from a string. These queries are only part of the
overall query. The standard approach in Lucene using a QueryParser does not
work in ElasticSearch since the API requires a QueryParseContext. Looking
to create term queries or boolean queries with term queries, not query
string queries (exactly as in Lucene).

On Mon, Oct 8, 2012 at 8:03 PM, Chris Male gento0nz@gmail.com wrote:

Hi,

I'm a little lost as to what you're trying to do. Are you wanting to
parse your query on the client side for validation and then send it to ES
to execute?

On Tuesday, October 9, 2012 11:56:50 AM UTC+13, Ivan Brusic wrote:

I am very aware of the query_string query. I am looking for behavior
analogous to Lucene's QueryParser.

My workflow is not suited for the Validate API. Only a portion of the
query (actually, the filter) is derived from string in question. If the
string is invalid, it is dropped from the query. The Validate API is
all-or-nothing, it does not easily identify the offending subclause.
Besides, it requires a network hop.

Cheers,

Ivan

On Mon, Oct 8, 2012 at 2:53 PM, Mark Waddle ma...@markwaddle.com wrote:

The Query DSL supports Lucene queries via the "query_string" query. See
http://www.elasticsearch.org/guide/reference/query-dsl/
query-string-query.htmlhttp://www.elasticsearch.org/guide/reference/query-dsl/query-string-query.html
.

To validate your queries I recommend using the Validate API, as
documented at http://www.elasticsearch.org/guide/reference/api/
validate.htmlhttp://www.elasticsearch.org/guide/reference/api/validate.html
.

On Monday, October 8, 2012 11:30:00 AM UTC-7, Ivan Brusic wrote:

Part of my system accepts strings in the Lucene syntax, which are
either single terms "123" or groups "(123 4 3412)".

With Lucene, I can use a QueryParser to parse a query string and it
would return either a TermQuery or a BooleanQuery. In ElasticSearch, the
QueryParser requires a QueryParseContext, which means it probably cannot be
used outside the context of ElasticSearch. Parsing on the client side also
allows me to check for potential errors in the string.

My current solution is to use Lucene's QueryParser and convert Lucene
Querys into their ElasticSearch equivalent. Does a better way exist using
straight ElasticSearch? Either a way to create a simple QueryParseContext
or a QueryBuilder that accepts a Lucene Query.

Cheers,

Ivan

--

--

--


(Chris Male) #6

So you're doing this on the client side? I think I understand the problem.
Some parts your queries need a QueryParser and some don't, right? So you
want to parse the some parts of your queries client side and then send the
whole overall query to ES. Am I along the right lines now?

On Wednesday, October 10, 2012 7:41:14 AM UTC+13, Ivan Brusic wrote:

Looking to create queries from a string. These queries are only part of
the overall query. The standard approach in Lucene using a QueryParser does
not work in ElasticSearch since the API requires
a QueryParseContext. Looking to create term queries or boolean queries with
term queries, not query string queries (exactly as in Lucene).

On Mon, Oct 8, 2012 at 8:03 PM, Chris Male <gent...@gmail.com<javascript:>

wrote:

Hi,

I'm a little lost as to what you're trying to do. Are you wanting to
parse your query on the client side for validation and then send it to ES
to execute?

On Tuesday, October 9, 2012 11:56:50 AM UTC+13, Ivan Brusic wrote:

I am very aware of the query_string query. I am looking for behavior
analogous to Lucene's QueryParser.

My workflow is not suited for the Validate API. Only a portion of the
query (actually, the filter) is derived from string in question. If the
string is invalid, it is dropped from the query. The Validate API is
all-or-nothing, it does not easily identify the offending subclause.
Besides, it requires a network hop.

Cheers,

Ivan

On Mon, Oct 8, 2012 at 2:53 PM, Mark Waddle ma...@markwaddle.comwrote:

The Query DSL supports Lucene queries via the "query_string" query. See
http://www.elasticsearch.org/guide/reference/query-dsl/
query-string-query.htmlhttp://www.elasticsearch.org/guide/reference/query-dsl/query-string-query.html
.

To validate your queries I recommend using the Validate API, as
documented at http://www.elasticsearch.org/guide/reference/api/
validate.htmlhttp://www.elasticsearch.org/guide/reference/api/validate.html
.

On Monday, October 8, 2012 11:30:00 AM UTC-7, Ivan Brusic wrote:

Part of my system accepts strings in the Lucene syntax, which are
either single terms "123" or groups "(123 4 3412)".

With Lucene, I can use a QueryParser to parse a query string and it
would return either a TermQuery or a BooleanQuery. In ElasticSearch, the
QueryParser requires a QueryParseContext, which means it probably cannot be
used outside the context of ElasticSearch. Parsing on the client side also
allows me to check for potential errors in the string.

My current solution is to use Lucene's QueryParser and convert Lucene
Querys into their ElasticSearch equivalent. Does a better way exist using
straight ElasticSearch? Either a way to create a simple QueryParseContext
or a QueryBuilder that accepts a Lucene Query.

Cheers,

Ivan

--

--

--


(Ivan Brusic) #7

Correct. The query creation process is actually quite detailed. The
QueryParser is used for filters that are "almost" Lucene syntax. Each
filter is processed (into Lucene syntax), parsed, reprocessed (adding in
things removed in the first step) and added to the overall query. Lucene's
QueryParser is a great class.

My current solution works, but is a bit kludgy (I hate using instanceof). I
have a long history of taking Shay's code and using it in ways that were
not meant to be :slight_smile:
http://forum.compass-project.org/message.jspa?messageID=294616#294616

Ivan

On Tue, Oct 9, 2012 at 9:18 PM, Chris Male gento0nz@gmail.com wrote:

So you're doing this on the client side? I think I understand the problem.
Some parts your queries need a QueryParser and some don't, right? So you
want to parse the some parts of your queries client side and then send the
whole overall query to ES. Am I along the right lines now?

--


(Chris Male) #8

Hmm... I'm not really sure there is a particularly elegant way of doing
this. There has been some development in Lucene (
https://issues.apache.org/jira/browse/LUCENE-4012) to make it easy to
convert Queries into JSON. Maybe there is something in that you can pick
and misuse :slight_smile:

On Thursday, October 11, 2012 8:21:18 AM UTC+13, Ivan Brusic wrote:

Correct. The query creation process is actually quite detailed. The
QueryParser is used for filters that are "almost" Lucene syntax. Each
filter is processed (into Lucene syntax), parsed, reprocessed (adding in
things removed in the first step) and added to the overall query. Lucene's
QueryParser is a great class.

My current solution works, but is a bit kludgy (I hate using instanceof).
I have a long history of taking Shay's code and using it in ways that were
not meant to be :slight_smile:
http://forum.compass-project.org/message.jspa?messageID=294616#294616

Ivan

On Tue, Oct 9, 2012 at 9:18 PM, Chris Male <gent...@gmail.com<javascript:>

wrote:

So you're doing this on the client side? I think I understand the
problem. Some parts your queries need a QueryParser and some don't, right?
So you want to parse the some parts of your queries client side and then
send the whole overall query to ES. Am I along the right lines now?

--


(Shay Banon) #9

You current approach is the one that should be used, do you really need to use ES query parser one (because of the mappings support)?

On Oct 10, 2012, at 12:21 PM, Ivan Brusic ivan@brusic.com wrote:

Correct. The query creation process is actually quite detailed. The QueryParser is used for filters that are "almost" Lucene syntax. Each filter is processed (into Lucene syntax), parsed, reprocessed (adding in things removed in the first step) and added to the overall query. Lucene's QueryParser is a great class.

My current solution works, but is a bit kludgy (I hate using instanceof). I have a long history of taking Shay's code and using it in ways that were not meant to be :slight_smile:
http://forum.compass-project.org/message.jspa?messageID=294616#294616

Ivan

On Tue, Oct 9, 2012 at 9:18 PM, Chris Male gento0nz@gmail.com wrote:
So you're doing this on the client side? I think I understand the problem. Some parts your queries need a QueryParser and some don't, right? So you want to parse the some parts of your queries client side and then send the whole overall query to ES. Am I along the right lines now?

--

--


(phill) #10

In my case, I'm using the ES query parser, before sending a search off
to ES, to get to the parts of what the user has typed and then do some
"enhancements".
I take the parts from the user query and add 1 or more extra phrases.
The extra phrases are made from the 'simple words' in the query (simple
= bare terms no explicit boost, field name etc.). Now with ES, I
sometimes use the query parts and create queries against both a parent
and a child type in the index, but that's all after figuring out the
parts provided.

Currently, because it was ported from Lucene the class that holds these
"simple terms and other phrases" keeps the parts as Lucene
BooleanClauses, because (a) the code started in Lucene and that's what
the query parser produces and (b) a BooleanClause is sufficient to hold
the set of must/should, the field name, the term, the boost. This
worked for me because I mostly care about bare 'simple' terms, otherwise
clauses can become a query_string with minimal processing.

Ivan mentioned instanceof, the code for the above uses instanceof to
recognize TermQuery to find the terms I'll be working with and contains
instanceof BooleanQuery, that is all. There is code somewhere that
notices if the whole things is a match all query which of course doesn't
lead to much need for query 'enhancing'.

-Paul

On 10/11/2012 8:08 AM, Shay Banon wrote:

You current approach is the one that should be used, do you really
need to use ES query parser one (because of the mappings support)?

On Oct 10, 2012, at 12:21 PM, Ivan Brusic <ivan@brusic.com
mailto:ivan@brusic.com> wrote:

Correct. The query creation process is actually quite detailed. The
QueryParser is used for filters that are "almost" Lucene syntax. Each
filter is processed (into Lucene syntax), parsed, reprocessed (adding
in things removed in the first step) and added to the overall query.
Lucene's QueryParser is a great class.

My current solution works, but is a bit kludgy (I hate using
instanceof). I have a long history of taking Shay's code and using it
in ways that were not meant to be :slight_smile:
http://forum.compass-project.org/message.jspa?messageID=294616#294616

Ivan

On Tue, Oct 9, 2012 at 9:18 PM, Chris Male <gento0nz@gmail.com
mailto:gento0nz@gmail.com> wrote:

So you're doing this on the client side? I think I understand the
problem.  Some parts your queries need a QueryParser and some
don't, right? So you want to parse the some parts of your queries
client side and then send the whole overall query to ES.  Am I
along the right lines now?

--

--

--


(Ivan Brusic) #11

On Thu, Oct 11, 2012 at 8:08 AM, Shay Banon kimchy@gmail.com wrote:

You current approach is the one that should be used, do you really need to
use ES query parser one (because of the mappings support)?

Good to know my approach should be correct. What I can't find is how ES
uses the Query returned from org.elasticsearch.index.query.QueryParser and
transforms it.

I don't need mapping support because I already have wonderful workaround
for creating ES analyzers on the client-side.

On Thu, Oct 11, 2012 at 9:02 AM, P. Hill parehill1@gmail.com wrote:

In my case, I'm using the ES query parser, before sending a search off to
ES, to get to the parts of what the user has typed and then do some
"enhancements".
I take the parts from the user query and add 1 or more extra phrases. The
extra phrases are made from the 'simple words' in the query (simple = bare
terms no explicit boost, field name etc.). Now with ES, I sometimes use
the query parts and create queries against both a parent and a child type
in the index, but that's all after figuring out the parts provided.

My use case is similar to yours. Constructing a query from many smaller
pieces. Like I said, the QueryParser is a great class.

Ivan mentioned instanceof, the code for the above uses instanceof to
recognize TermQuery to find the terms I'll be working with and contains
instanceof BooleanQuery, that is all. There is code somewhere that notices
if the whole things is a match all query which of course doesn't lead to
much need for query 'enhancing'.

True, there are only two instanceof checks (TermQuery/BooleanQuery), but if
I am using an object-oriented language, I prefer to use OO techniques.
Might as well use a dynamic language and duck typing!

Cheers,

Ivan

--


(phill) #12

On 10/11/2012 12:28 PM, Ivan Brusic wrote:

Ivan mentioned instanceof, the code for the above uses instanceof
to recognize TermQuery to find the terms I'll be working with and
contains instanceof BooleanQuery, that is all.  There is code
somewhere that notices if the whole things is a match all query
which of course doesn't lead to much need for query 'enhancing'.

True, there are only two instanceof checks (TermQuery/BooleanQuery),
but if I am using an object-oriented language, I prefer to use OO
techniques. Might as well use a dynamic language and duck typing!

I'm at a loss to image walking a tree and finding a node and then not
having to ask if the node is a certain type like TermQuery.
If you want to call a method on the object to ask if it is a TermQuery
you could always use
someQuery.getClass().isAssignableFrom(TermQuery.class);
:slight_smile:

But that does violates the Law of Demeter, because I formed a small
train of method calls, thus creating a potential for a "train wreck".

-Paul

--


(Jörg Prante) #13

Agreed, QueryParseContext should be an interface, so plugging in mock
parsers would be possible.

This would be useful for plugins that can perform query rewriting and query
transformations, e.g. inserting synonyms, named entities, or spelling
corrections with automatic resubmit at server side (and returning the
performed query transformations to the client).

My rather limited approach now is using the QueryBuilders only, at client
side, without Lucene syntax tree, without ES field mapping, for one-phase
only query construction.

Jörg

On Monday, October 8, 2012 8:30:00 PM UTC+2, Ivan Brusic wrote:

Part of my system accepts strings in the Lucene syntax, which are either
single terms "123" or groups "(123 4 3412)".

With Lucene, I can use a QueryParser to parse a query string and it would
return either a TermQuery or a BooleanQuery. In ElasticSearch, the
QueryParser requires a QueryParseContext, which means it probably cannot be
used outside the context of ElasticSearch. Parsing on the client side also
allows me to check for potential errors in the string.

My current solution is to use Lucene's QueryParser and convert Lucene
Querys into their ElasticSearch equivalent. Does a better way exist using
straight ElasticSearch? Either a way to create a simple QueryParseContext
or a QueryBuilder that accepts a Lucene Query.

Cheers,

Ivan

--


(Jörg Prante) #14

One more note,
org.elasticsearch.test.unit.index.query.SimpleIndexQueryParserTests gives
an example how to instantiate an IndexQueryParserService for testing.

On Tuesday, October 16, 2012 1:16:58 AM UTC+2, Jörg Prante wrote:

Agreed, QueryParseContext should be an interface, so plugging in mock
parsers would be possible.

This would be useful for plugins that can perform query rewriting and
query transformations, e.g. inserting synonyms, named entities, or spelling
corrections with automatic resubmit at server side (and returning the
performed query transformations to the client).

My rather limited approach now is using the QueryBuilders only, at client
side, without Lucene syntax tree, without ES field mapping, for one-phase
only query construction.

Jörg

On Monday, October 8, 2012 8:30:00 PM UTC+2, Ivan Brusic wrote:

Part of my system accepts strings in the Lucene syntax, which are either
single terms "123" or groups "(123 4 3412)".

With Lucene, I can use a QueryParser to parse a query string and it would
return either a TermQuery or a BooleanQuery. In ElasticSearch, the
QueryParser requires a QueryParseContext, which means it probably cannot be
used outside the context of ElasticSearch. Parsing on the client side also
allows me to check for potential errors in the string.

My current solution is to use Lucene's QueryParser and convert Lucene
Querys into their ElasticSearch equivalent. Does a better way exist using
straight ElasticSearch? Either a way to create a simple QueryParseContext
or a QueryBuilder that accepts a Lucene Query.

Cheers,

Ivan

--


(system) #15