Did ES ever consider replacing current query DSL with an external DSL?


(xin zhang) #1

I used ES for one year I like all features it provides except the query
DSL. The json style DSL is not intuitive and was confused me for a while
when I was putting multiple conditions together at very beginning.

enlightened by Jira, I am thinking why don't ES create a DSL like:

project = ES AND issuetype = "New Feature" AND fixVersion = 3.1 ORDER BY
created DESC, cf[10514] DESC

this kind of external DSL needs a parser which might be the reason ES chose
json style internal DSL at beginning. But from user perspective, the
external DSL is much better to understand and use. I know there are SDKs
for most of popular languages which relieves people's life a lot, but an
easier query DSL is still valuable to people who want to directly play with
ES's rest api.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7650a534-fab1-4ccf-addd-a59085f35f08%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Alexander Reelsen) #2

Hey,

you can write your own plugin, which simply registers a custom parser on
start up of elasticsearch, which is able to parse the above statement. So,
basically there is nothing stopping you from doing this. You could embed
this parser in the JSON DSL, but if you do not like it, you could simply
write your own REST HTTP action, which parses the above as text without
JSON being involved.

Take a look at some rest actions and at the QueryParser interface and its
implementions.

Elasticsearch tries to be as modular as possible in order to give you
exactly this flexibility, that you can write your own custom
implementations, in case you are unhappy with the status quo. And hopefully
open source them :slight_smile:

--Alex

On Thu, Dec 19, 2013 at 8:05 PM, xin zhang xing5820@gmail.com wrote:

I used ES for one year I like all features it provides except the query
DSL. The json style DSL is not intuitive and was confused me for a while
when I was putting multiple conditions together at very beginning.

enlightened by Jira, I am thinking why don't ES create a DSL like:

project = ES AND issuetype = "New Feature" AND fixVersion = 3.1 ORDER BY
created DESC, cf[10514] DESC

this kind of external DSL needs a parser which might be the reason ES
chose json style internal DSL at beginning. But from user perspective, the
external DSL is much better to understand and use. I know there are SDKs
for most of popular languages which relieves people's life a lot, but an
easier query DSL is still valuable to people who want to directly play with
ES's rest api.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/7650a534-fab1-4ccf-addd-a59085f35f08%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGCwEM_ofHn8ApoELzeqSJG2Zrfu2bTJp2nBeDXpP7AP%3DSLAMA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jörg Prante) #3

Just some food for thought:

In the DSL you give, there are some subtle issues which makes it hard for
implementing a parser and correct query execution.

For example, the word "New feature": is it just a phrase, or is it two
terms? Is a span query for the two terms also valid?

Another issue is well-formed input data typing. Is "ES" in the query an
input of type "string"? And is "3.1" is double value or also a string type?
Can types always be inferenced? What about date parsing, will date types be
assigned automatically? Maybe by looking up the ES mapping (well, ES
parsers do that already) ? Or, how to decide if "true" is the string "true"
or the boolean constant? ES catches this but I mean the general case of an
external DSL is dealing with JSON data type flaws. You are forced to
re-implement the ES parser for all these nasty pitfalls.

How to declare facets and filters in an external DSL? Or multi phrase and
multi wildcard searching? It's not straightforward if there is simply no
context information how to execute such things. How do you parse and
translate wildcards mixed/nested with phrases such as "scien* 'week*
magazin*'"? I was surprised how many folks are trained to use wildcards
excessively. It is only possible to replace such "bad queries" with
heuristics that can be executed on ES with high performance, with ranked
results etc.

I have written a CQL parser
http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part5-cql/searchRetrieve-v1.0-os-part5-cql.pdfto
generate ES DSL, but for Java API only. If anyone is interested in
adding a CQL parser as a REST action, I could offer it as a plugin. It is
of course not perfect, I'm not very satisfied with the result.

My experience so far is, at least for CQL, because it is a weak typed query
language (it does not have a notion of input data types) , that external
query languages must really be able to match the power of Lucene/ES
features, or you get into trouble implementing simplifications, fallbacks,
and shortcuts all the way.

So my favorite is still ES DSL, and for establishing simple searches, there
are special featured query types that are designed for simplified free form
input. E.g. ES DSL query type "query_string" understands the Lucene syntax,
or the "match" query, and now we also have the "simple_query"

I'm interested in OpenSearch for ES http://www.opensearch.org/ so if anyone
is working on this, it would be nice to know.

Jörg

On Thu, Dec 19, 2013 at 8:05 PM, xin zhang xing5820@gmail.com wrote:

project = ES AND issuetype = "New Feature" AND fixVersion = 3.1 ORDER BY
created DESC, cf[10514] DESC

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGhkOE038spXw1Tmm0zLv%3DS%3D8YAztaMZEYK-hVNwhMB5Q%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(xin zhang) #4

Thanks Alex and Jorg. Frankly speaking I didn't think of details of
implementing such external DSL in ES. I am just from a user's perspective
thinking of what's the best user experience.

Jorg raises a good question that how to distinguish data type given the
nature of schemaless in ES, the DSL may have to have some keywords(e.g int,
long, date) to tell data type explicitly if it's important, otherwise all
data are treated as string. For other questions I can not answer because
some of features like facet I have not even used. I know it's not
straightforward to do such thing, but I do think if JSON DSL can solve
these problems then the external DSL can also do because it's more flexible
like a programming language. I have been playing with some NoSql databases
as well as search engine like ES, an impression I got is every software has
its own search DSL which are not user friendly. Though I think it might be
impossible for this area to agree on a standard similar to SQL given these
software focus on different parts and usually vary a lot, I think using a
SQL style DSL without join and sub-select can make user's life much easier.

On Friday, December 20, 2013 4:32:30 AM UTC-8, Jörg Prante wrote:

Just some food for thought:

In the DSL you give, there are some subtle issues which makes it hard for
implementing a parser and correct query execution.

For example, the word "New feature": is it just a phrase, or is it two
terms? Is a span query for the two terms also valid?

Another issue is well-formed input data typing. Is "ES" in the query an
input of type "string"? And is "3.1" is double value or also a string type?
Can types always be inferenced? What about date parsing, will date types be
assigned automatically? Maybe by looking up the ES mapping (well, ES
parsers do that already) ? Or, how to decide if "true" is the string "true"
or the boolean constant? ES catches this but I mean the general case of an
external DSL is dealing with JSON data type flaws. You are forced to
re-implement the ES parser for all these nasty pitfalls.

How to declare facets and filters in an external DSL? Or multi phrase and
multi wildcard searching? It's not straightforward if there is simply no
context information how to execute such things. How do you parse and
translate wildcards mixed/nested with phrases such as "scien* 'week*
magazin*'"? I was surprised how many folks are trained to use wildcards
excessively. It is only possible to replace such "bad queries" with
heuristics that can be executed on ES with high performance, with ranked
results etc.

I have written a CQL parser
http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part5-cql/searchRetrieve-v1.0-os-part5-cql.pdfto generate ES DSL, but for Java API only. If anyone is interested in
adding a CQL parser as a REST action, I could offer it as a plugin. It is
of course not perfect, I'm not very satisfied with the result.

My experience so far is, at least for CQL, because it is a weak typed
query language (it does not have a notion of input data types) , that
external query languages must really be able to match the power of
Lucene/ES features, or you get into trouble implementing simplifications,
fallbacks, and shortcuts all the way.

So my favorite is still ES DSL, and for establishing simple searches,
there are special featured query types that are designed for simplified
free form input. E.g. ES DSL query type "query_string" understands the
Lucene syntax, or the "match" query, and now we also have the
"simple_query" https://github.com/elasticsearch/elasticsearch/pull/4402

I'm interested in OpenSearch for ES http://www.opensearch.org/ so if
anyone is working on this, it would be nice to know.

Jörg

On Thu, Dec 19, 2013 at 8:05 PM, xin zhang <xing...@gmail.com<javascript:>

wrote:

project = ES AND issuetype = "New Feature" AND fixVersion = 3.1 ORDER BY
created DESC, cf[10514] DESC

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0a5a07e4-666b-4498-a781-c48882fae2c6%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Lukáš Vlček) #5

Jörg,
is opensearch active? The last time I checked it it seemed pretty much dead.
Lukáš
Dne 20.12.2013 13:32 "joergprante@gmail.com" joergprante@gmail.com
napsal(a):

Just some food for thought:

In the DSL you give, there are some subtle issues which makes it hard for
implementing a parser and correct query execution.

For example, the word "New feature": is it just a phrase, or is it two
terms? Is a span query for the two terms also valid?

Another issue is well-formed input data typing. Is "ES" in the query an
input of type "string"? And is "3.1" is double value or also a string type?
Can types always be inferenced? What about date parsing, will date types be
assigned automatically? Maybe by looking up the ES mapping (well, ES
parsers do that already) ? Or, how to decide if "true" is the string "true"
or the boolean constant? ES catches this but I mean the general case of an
external DSL is dealing with JSON data type flaws. You are forced to
re-implement the ES parser for all these nasty pitfalls.

How to declare facets and filters in an external DSL? Or multi phrase and
multi wildcard searching? It's not straightforward if there is simply no
context information how to execute such things. How do you parse and
translate wildcards mixed/nested with phrases such as "scien* 'week*
magazin*'"? I was surprised how many folks are trained to use wildcards
excessively. It is only possible to replace such "bad queries" with
heuristics that can be executed on ES with high performance, with ranked
results etc.

I have written a CQL parser
http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part5-cql/searchRetrieve-v1.0-os-part5-cql.pdfto generate ES DSL, but for Java API only. If anyone is interested in
adding a CQL parser as a REST action, I could offer it as a plugin. It is
of course not perfect, I'm not very satisfied with the result.

My experience so far is, at least for CQL, because it is a weak typed
query language (it does not have a notion of input data types) , that
external query languages must really be able to match the power of
Lucene/ES features, or you get into trouble implementing simplifications,
fallbacks, and shortcuts all the way.

So my favorite is still ES DSL, and for establishing simple searches,
there are special featured query types that are designed for simplified
free form input. E.g. ES DSL query type "query_string" understands the
Lucene syntax, or the "match" query, and now we also have the
"simple_query" https://github.com/elasticsearch/elasticsearch/pull/4402

I'm interested in OpenSearch for ES http://www.opensearch.org/ so if
anyone is working on this, it would be nice to know.

Jörg

On Thu, Dec 19, 2013 at 8:05 PM, xin zhang xing5820@gmail.com wrote:

project = ES AND issuetype = "New Feature" AND fixVersion = 3.1 ORDER BY
created DESC, cf[10514] DESC

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGhkOE038spXw1Tmm0zLv%3DS%3D8YAztaMZEYK-hVNwhMB5Q%40mail.gmail.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAO9cvUbqrVMq%3DoM5MS%3DwW2tJqFxC8FwC-d7R9CG4Hm1ihjfgtQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jörg Prante) #6

Opensearch is a bit silent, but not defunct, there is a low activity
mailing list

https://groups.google.com/forum/#!forum/opensearch

My impression is there is not much more Opensearch can be developed into,
as a "de facto" very lightweight standard.

Example

http://chroniclingamerica.loc.gov/search/pages/results/?proxtext=ducks&format=json

That is a special endpoint search that understands a simplified query
parameter. The query parameter can be anything, for example an SQL-like
syntax like xin zhang suggested.

The effect would be the integration of ES into 3rd party products that
offer simplified search.

Opensearch is known from browser search forms like Firefox search plugins.
This would of course not replace ES official clients, but some PoCs were
easier ("let's get some ES results in our product and show it in the
browser").

OpenSearch integration into Wikis, Blogs
http://www.opensearch.org/Community/OpenSearch_software

OpenSearch Geo
http://www.weichand.de/2010/10/27/opensearch-geo-die-einfache-raeumliche-suche/

All in all, what the plugin would have to do is a bit of processing
simplified queries and reformatting ES JSON result, preferably as Atom
feeds, both XML and JSON. So a reverse HTTP proxy could just pass
parameters and results.

Jörg

On Sat, Dec 21, 2013 at 10:06 AM, Lukáš Vlček lukas.vlcek@gmail.com wrote:

Jörg,
is opensearch active? The last time I checked it it seemed pretty much
dead.
Lukáš

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFM1HQ5_5PtGRrSazBzGns-ZEX-rwtPqrNXUL2RgYJngw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Techno Shaft) #7

I had same exact requirement of doing a user facing query dsl on top of elasticsearch dsl. I had a look at SearchRetrieve and it looks quite promising, I wish it was open sourced!


(system) #8