JSON Query API v2


(Andrew Gross-2) #1

I was curious if there were any plans to update or modify the JSON query
API in ES 2.0+?

While I find the API to very powerful, it is confusing to construct a valid
request and requires special casing a lot of rules. I have some thoughts
below on what I see as the current issues, and some suggestions to correct
them. I don't intend for this to be a rant, just to provoke discussion.
This is done purely from the point of view of constructing queries (not
parsing them), and only for the JSON DSL query syntax for searching (not
percolate or aggregators). My aim is to make it easier to re-use parts of
queries easily across a codebase without hand-crafting them every time.

It is currently hard to construct small parts of a JSON query without
knowing all of the elements involved. Looking at a simple query and a
filtered query:

Simple Query:
{
"query": {
"match_all": {}
}
}

Filtered Query:
{
"query": {
"filtered": {
"filter": {
"and": [
{
"term": {
"foo": "bar"
}
}
]
},
"query": {
"match_all": {}
}
}
}
}

We see the syntax tree change so that the initial 'query' becomes nested
and the root of the tree changes. Once we add a scoring function, it
morphs even further.

Scored Query:
{
"query": {
"function_score": {
"query": {
"filtered": {
"filter": {
"and": [
{
"term": {
"foo": "bar"
}
}
]
},
"query": {
"match_all": {}
}
}
},
"script_score": {
"script": "result = 0.0 + 1.0;"
},
"boost_mode": "replace"
}
}
}

This follows some of the same rules (nested inside a new scope), however,
not all of the changes get placed together. We have both a 'script_score'
block and a 'boost_mode' section. This means that when I want to add
scoring to my query I need to know my scoring block as well as the rest of
the query tree so that I can properly place 'boost_mode'.

A simple(r) example. In a simple scored query, if I want to modify my
'match_all' block, my path becomes
"query" -> "function_score" -> "query"

Once I add filtering to the query, the path changes, causing a broken query
if I insert in the old location.
"query" -> "function_score" -> "query" -> "filtered" -> "query"

It would be much simpler if I could define my scoring block, and throw it
in to a query at a static path without worrying what else is in the query.
This case is a simple illustration, but the JSON query DSL contains many
instances, especially around cases like 'scoring' where using a single
scoring block vs. multiple scoring functions radically changes the
structure of the scoring section.

I understand that this was designed iteratively, and that the syntax will
not be perfect of both parsing and construction. Now that the JSON query
DSL seems to have a stable set of elements, it would be useful to set it up
so that it can be written in a simple manner. A few considerations:

  1. When adding an element such as scoring, have it only modify elements
    below it in the tree (aside from its initial insertion point)
  2. Keep the root of the tree static and have the existence of a top level
    key modify behavior, instead of needing change the nesting of elements.
  3. Somehow stop nesting the term "query" all over the place, definitely the
    most confusing thing for new users in my experience. =D

Here is a proposed top level DSL example. It's incomplete and probably
missing some things but useful as an illustration:
{
"filter": {
"and": [
{
"term": {
"foo": "bar"
}
}
]
},
"query": {
"match_all": {}
},
"scoring": {
"script_score": {
"script": "result = 0.0 + 1.0;"
},
"boost_mode": "replace"
},
"sort": [
{
"foo": {
"order": "desc",
"mode": "average"
}
}
]
}

Thanks for reading over this. I was unable to find a roadmap for
prospective features, so if there are already plans to work on this feel
free to disregard my comments.

Thanks,
Andrew

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/52e2fa0e-668a-4bf9-898d-ecbb61e5aea9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Andrew Gross-2) #2

Bump

On Monday, June 30, 2014 3:28:46 PM UTC-4, Andrew Gross wrote:

I was curious if there were any plans to update or modify the JSON query
API in ES 2.0+?

While I find the API to very powerful, it is confusing to construct a
valid request and requires special casing a lot of rules. I have some
thoughts below on what I see as the current issues, and some suggestions to
correct them. I don't intend for this to be a rant, just to provoke
discussion. This is done purely from the point of view of constructing
queries (not parsing them), and only for the JSON DSL query syntax for
searching (not percolate or aggregators). My aim is to make it easier to
re-use parts of queries easily across a codebase without hand-crafting them
every time.

It is currently hard to construct small parts of a JSON query without
knowing all of the elements involved. Looking at a simple query and a
filtered query:

Simple Query:
{
"query": {
"match_all": {}
}
}

Filtered Query:
{
"query": {
"filtered": {
"filter": {
"and": [
{
"term": {
"foo": "bar"
}
}
]
},
"query": {
"match_all": {}
}
}
}
}

We see the syntax tree change so that the initial 'query' becomes nested
and the root of the tree changes. Once we add a scoring function, it
morphs even further.

Scored Query:
{
"query": {
"function_score": {
"query": {
"filtered": {
"filter": {
"and": [
{
"term": {
"foo": "bar"
}
}
]
},
"query": {
"match_all": {}
}
}
},
"script_score": {
"script": "result = 0.0 + 1.0;"
},
"boost_mode": "replace"
}
}
}

This follows some of the same rules (nested inside a new scope), however,
not all of the changes get placed together. We have both a 'script_score'
block and a 'boost_mode' section. This means that when I want to add
scoring to my query I need to know my scoring block as well as the rest of
the query tree so that I can properly place 'boost_mode'.

A simple(r) example. In a simple scored query, if I want to modify my
'match_all' block, my path becomes
"query" -> "function_score" -> "query"

Once I add filtering to the query, the path changes, causing a broken
query if I insert in the old location.
"query" -> "function_score" -> "query" -> "filtered" -> "query"

It would be much simpler if I could define my scoring block, and throw it
in to a query at a static path without worrying what else is in the query.
This case is a simple illustration, but the JSON query DSL contains many
instances, especially around cases like 'scoring' where using a single
scoring block vs. multiple scoring functions radically changes the
structure of the scoring section.

I understand that this was designed iteratively, and that the syntax will
not be perfect of both parsing and construction. Now that the JSON query
DSL seems to have a stable set of elements, it would be useful to set it up
so that it can be written in a simple manner. A few considerations:

  1. When adding an element such as scoring, have it only modify elements
    below it in the tree (aside from its initial insertion point)
  2. Keep the root of the tree static and have the existence of a top level
    key modify behavior, instead of needing change the nesting of elements.
  3. Somehow stop nesting the term "query" all over the place, definitely
    the most confusing thing for new users in my experience. =D

Here is a proposed top level DSL example. It's incomplete and probably
missing some things but useful as an illustration:
{
"filter": {
"and": [
{
"term": {
"foo": "bar"
}
}
]
},
"query": {
"match_all": {}
},
"scoring": {
"script_score": {
"script": "result = 0.0 + 1.0;"
},
"boost_mode": "replace"
},
"sort": [
{
"foo": {
"order": "desc",
"mode": "average"
}
}
]
}

Thanks for reading over this. I was unable to find a roadmap for
prospective features, so if there are already plans to work on this feel
free to disregard my comments.

Thanks,
Andrew

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/dd3bb2f0-c703-44ae-94b1-a5e6da9c3cfc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #3