Giant Elasticsearch query

sparklegoat · December 20, 2017, 5:58pm

I have a list of must and must_not items that I currently have in a giant query but I want to know if this is the best way about the problem.

example of the query:

I have 470 must items and 485 must_not items that are a whitelist/blacklist type of rules for data. The analytic is built in spark and the data is housed in elastic search. The query I am passing to spark is a query with one of the must followed by all 485 must_not items.

{"query":{ "bool" : { "must" : {"match" : {"tag":"apple"}}, "must_not": [{ "match": { "city": "new york" }},{ "match": { "name": "pizza" }},........... ]}}}

As you can guess the query itself is rather large and takes around 2 seconds to return the results. I am submitting this type of query for each of the must items so therefore 470 queries passed. This application currently takes around 22 min to complete.

My question - Is this the best way to tackle this problem or is there a way to make it faster and is this even a good problem for elasticsearch at all given the gigantic complex query?

I have previously attempted to preform spark joins with the data after just passing a query with just the must_not data, which takes far longer than the 470 elastic search individual queries. I used a broadcast hash join because the must data is smaller that the resultant data frame.

Thank you for the help.

sparklegoat · December 22, 2017, 3:18pm

I decided to combine the it into one large query which drastically cut the run time down because the spark - ES overhead did't happen 470 times.

system · January 19, 2018, 3:26pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Boolean must_not Query is having slow performance Elasticsearch	4	2469	July 5, 2017
Search for something that must not contain xxx or xxx or xxx Elasticsearch	3	6415	July 5, 2017
Problem with must_not query please help Elasticsearch	3	551	December 9, 2019
Query optimisation: must vs must_not order Elasticsearch	1	910	December 8, 2017
Many slow query with high load after a hour Elasticsearch	12	626	July 6, 2017

Giant Elasticsearch query

Related topics