Query string query mini-language vs. grammar implementation?

x0ne_2 · July 9, 2014, 12:35am

Ever since I discovered the mini-language provided through the query string
query
(http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html),
I have had a hard time going back to the difficult process of mapping what
someone wants to a proper elasticsearch query. As such, I have essentially
provided users with the ability to create their own query strings and then
execute them directly against the cluster (10s of millions of documents).

This approach works great until a several complex queries are ran in a row
which then appears to send the cluster into an OOM panic. Is there a way to
put some sanity checks inside of the query string query to avoid insane
results coming back? Can I limit the number of results loaded onto the heap
or put into the cache? Have others just rolled their own grammar parsing
instead of using the mini-language directly?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c7d5626c-b485-47d0-bcd8-5472bb38b5d3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Robert_Muir_2 · July 10, 2014, 12:51pm

On Tue, Jul 8, 2014 at 8:35 PM, x0ne brandon.s.dixon@gmail.com wrote:

Ever since I discovered the mini-language provided through the query string
query
(Elasticsearch Platform — Find real-time answers at scale | Elastic),
I have had a hard time going back to the difficult process of mapping what
someone wants to a proper elasticsearch query. As such, I have essentially
provided users with the ability to create their own query strings and then
execute them directly against the cluster (10s of millions of documents).

This approach works great until a several complex queries are ran in a row
which then appears to send the cluster into an OOM panic. Is there a way to
put some sanity checks inside of the query string query to avoid insane
results coming back? Can I limit the number of results loaded onto the heap
or put into the cache? Have others just rolled their own grammar parsing
instead of using the mini-language directly?

Have you looked at simple query string?

This one is more limited, but it has a flags parameter that lets you
turn every feature or operator on/off. So you could disable wildcard,
phrase, fuzzy, etc.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAMUKNZVu-f9NiOxByFCzm4zaWVZ0y6%2BypaKqripvPHBDdLLiLQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Maciej_Dziardziel · July 10, 2014, 3:25pm

You can define timeout (adding ie. timeout=400 in url) to limit time ES
will spend waiting for results, at the cost of possibility of returning
incomplete results.
Any "sanity checks" are tricky, as you'd need to define what is allowed,
how it will perform against your index, and then parse query to enforce it
(and then keep up with lucene on upgrades).

If your users don't require complex queries, don't provide them. Since I'm
mostly dealing with internet users, I usually write my own parsers.

On Wednesday, July 9, 2014 1:35:14 AM UTC+1, x0ne wrote:

Ever since I discovered the mini-language provided through the query
string query (
Elasticsearch Platform — Find real-time answers at scale | Elastic),
I have had a hard time going back to the difficult process of mapping what
someone wants to a proper elasticsearch query. As such, I have essentially
provided users with the ability to create their own query strings and then
execute them directly against the cluster (10s of millions of documents).

This approach works great until a several complex queries are ran in a row
which then appears to send the cluster into an OOM panic. Is there a way to
put some sanity checks inside of the query string query to avoid insane
results coming back? Can I limit the number of results loaded onto the heap
or put into the cache? Have others just rolled their own grammar parsing
instead of using the mini-language directly?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/efcc9a1a-c47a-4a2f-b824-574bed385201%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.