Elastic Search Aggregations Slow

Good day
I have an Elasticsearch index with 50 million records in it. This is working as expected.
If I add aggregations to my query it is taking quite a long time to get the results. Any advise ?
Thanks in advance.

Kind regards.

Greetings,
try lower the date range so you don't need to aggregate that much records,

  • How slow it is?
  • What does look like the request?
  • What is the output?
  • Is it still the same after some runs? Or is it only the first run?
  • Which version are you using?
  • What kind of hardware?

Hi David,
Thanks.
It is slow

with aggregations
No of records in the result set: 7982
time taken : 35 seconds

without aggregations
same data is returned in 10 seconds

it is not only for the first request.
I am using 7.13.3 version.
Hardware is top notch (if I don't use aggregation it is quite quick)

my request is poco built on below parameters:
productform:"",
keyword: "",
contributor: "",
cop: "",
language: "",
publicationStatus: "",
imprint: "",
publisher: "",
wholesalers: "",
salesRights: "",
subject: "",
identifier: "",
populateAggregations: true,
pubDateFrom: "2010-09-10",
targetAudienceCode: "",
pubDateTo: "2022-09-10",
isAutoCompleteSearch: false

10 seconds is super slow.

Please share the request sent to Elasticsearch.

You might be running into this bug Slow StringTermsAggregatorFromFilters · Issue #76104 · elastic/elasticsearch (github.com) in 7.13.x, which was fixed in 7.14.0.

You can see if this is the case, by using the issues provided workaround, setting the following cluster setting:

"search.aggs.rewrite_to_filter_by_filter": false

I ran into a fairly similar issue not too long ago with a somewhat simple agg, and this turned out to be the issue.

But as mentioned previously, being able to see the query and agg you're actually running would be helpful here.

1 Like

Hi David,
Below are the data i am sending it to the ES from my GQL.
do you want some thing else?

{gqlbooks(title:"", isbn:"", productform:"", keyword:"", contributor:"", cop:"", language:"", publicationStatus:"", imprint:"", publisher:"", wholesalers:"", salesRights:"", subject:"", identifier:"0123b1e0-b723-470b-9143-8a2a74edcfb2", populateAggregations:true, pubDateFrom:"1996-10-14", targetAudienceCode:"", pubDateTo:"2022-10-14",isAutoCompleteSearch:false) { resultCount publicationdate publisher author isbn13 title productform languagetext audiences author noofpages publicationstatus productclassifiers productclassifiercodes subtitle wholesalers markets publicationstatus cop bookName imprint imageurl rpgList bucketDTO distributors}}

I'd like to see the HTTP Request which is sent to Elasticsearch.

I can not guess from that how this is then translated to the queryDSL.

Hi David,
appologies for delay, please find http request below:

GET /idx-myelasticindex/_search
{  "size": 0,  
  "query": {
    "match": {
      "bookName": "Prooi"
    }
  }, 
"aggs": {        
  "Terms_Aggregation" : {  
    "terms": {  
      "field":
      "cop.keyword"
    }
    },
      
  "Author_Aggregation" : {  
    "terms": {  
      "field":
      "author.keyword"}
    
  }, 
  
  "Format_Aggregation" : {  
    "terms": {  
      "field":
      "productform.keyword"}
    
  }, 
  
  "Status_Aggregation" : {  
    "terms": {  
      "field":
      "publicationstatus.keyword"}
    
  },
  "Readership_Aggregation" : {  
    "terms": {  
      "field":
      "audiences.keyword"}
    
  }
  
}        
  
}

And what is the full response from Elasticsearch?
Please share it in both cases. One with the aggs and one without any aggregation, but same query.

Hi David
The response object is quite a huge object, it is over the allowed size here, I have uploaded at below location, can you please try get it from below, first one is with and other one without aggregations:

https://www.kwiksnoop.com/documents/elastic.json

thanks in advance.

So. A first look at this gives me:

With aggregations:

{
  "took" : 15311,
  "hits" : {
    "total" : {
      "value" : 6,
      "relation" : "eq"
    }
...
}

Without aggregations:

  "took" : 21,

First of all, we can see that without aggs, the time is only 21ms. Not 10s.
Then, the time spent on the aggregation is 15s for only 6 documents. Which does not make sense at all.

Could you run the same agg again and again, and give the output (only the first lines until hits is enough) after some runs?
It it still slow?

Hardware is top notch

What kind of hardware do you have?

Hi David,
Thank you

First of all, we can see that without aggs, the time is only 21ms. Not 10s.
Then, the time spent on the aggregation is 15s for only 6 documents. Which does not make sense at all.

I am not too worried about response time for without aggregations 10s was for almost 8000 record, for aggregations response time huge for large data sets.

Could you run the same agg again and again, and give the output (only the first lines until hits is enough) after some runs?
It it still slow?
there is very little improvement, if I run 3 times the response time is 13.5 seconds.


 "took" : 13597,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 6,
      "relation" : "eq"
    },

Below is the server configuration:

RAM : 64 GB
64 Bit, OS
16 Core Processor
2.80 GHz
Windows 10

Could you add "profile": true when you run with the aggs?

What is the output of:

GET /
GET /_cat/nodes?v
GET /_cat/health?v
GET /_cat/indices?v

If some outputs are too big, please share them on gist.github.com and link them here.

Hi David,
Please see below picture for the quereis.

Please don't post images of text as they are hard to read, may not display correctly for everyone, and are not searchable.

Instead, paste the text and format it with </> icon or pairs of triple backticks (```), and check the preview window to make sure it's properly formatted before posting it. This makes it more likely that your question will receive a useful answer.

It would be great if you could update your post to solve this.

Also add the first query I asked for:

GET /

Thanks