Elastic Search Aggregations Slow

Srini12 · October 14, 2021, 9:24am

Good day
I have an Elasticsearch index with 50 million records in it. This is working as expected.
If I add aggregations to my query it is taking quite a long time to get the results. Any advise ?
Thanks in advance.

Kind regards.

alfianaf · October 14, 2021, 9:28am

Greetings,
try lower the date range so you don't need to aggregate that much records,

dadoonet · October 14, 2021, 10:34am

How slow it is?
What does look like the request?
What is the output?
Is it still the same after some runs? Or is it only the first run?
Which version are you using?
What kind of hardware?

Srini12 · October 14, 2021, 10:58am

Hi David,
Thanks.
It is slow

with aggregations
No of records in the result set: 7982
time taken : 35 seconds

without aggregations
same data is returned in 10 seconds

it is not only for the first request.
I am using 7.13.3 version.
Hardware is top notch (if I don't use aggregation it is quite quick)

my request is poco built on below parameters:
productform:"",
keyword: "",
contributor: "",
cop: "",
language: "",
publicationStatus: "",
imprint: "",
publisher: "",
wholesalers: "",
salesRights: "",
subject: "",
identifier: "",
populateAggregations: true,
pubDateFrom: "2010-09-10",
targetAudienceCode: "",
pubDateTo: "2022-09-10",
isAutoCompleteSearch: false

dadoonet · October 14, 2021, 11:46am

10 seconds is super slow.

Please share the request sent to Elasticsearch.

BenB196 · October 14, 2021, 12:05pm

You might be running into this bug Slow StringTermsAggregatorFromFilters · Issue #76104 · elastic/elasticsearch (github.com) in 7.13.x, which was fixed in 7.14.0.

You can see if this is the case, by using the issues provided workaround, setting the following cluster setting:

"search.aggs.rewrite_to_filter_by_filter": false

I ran into a fairly similar issue not too long ago with a somewhat simple agg, and this turned out to be the issue.

But as mentioned previously, being able to see the query and agg you're actually running would be helpful here.

Srini12 · October 14, 2021, 1:10pm

Hi David,
Below are the data i am sending it to the ES from my GQL.
do you want some thing else?

{gqlbooks(title:"", isbn:"", productform:"", keyword:"", contributor:"", cop:"", language:"", publicationStatus:"", imprint:"", publisher:"", wholesalers:"", salesRights:"", subject:"", identifier:"0123b1e0-b723-470b-9143-8a2a74edcfb2", populateAggregations:true, pubDateFrom:"1996-10-14", targetAudienceCode:"", pubDateTo:"2022-10-14",isAutoCompleteSearch:false) { resultCount publicationdate publisher author isbn13 title productform languagetext audiences author noofpages publicationstatus productclassifiers productclassifiercodes subtitle wholesalers markets publicationstatus cop bookName imprint imageurl rpgList bucketDTO distributors}}

dadoonet · October 14, 2021, 1:51pm

I'd like to see the HTTP Request which is sent to Elasticsearch.

I can not guess from that how this is then translated to the queryDSL.

Srini12 · October 16, 2021, 8:00am

Hi David,
appologies for delay, please find http request below:

GET /idx-myelasticindex/_search
{  "size": 0,  
  "query": {
    "match": {
      "bookName": "Prooi"
    }
  }, 
"aggs": {        
  "Terms_Aggregation" : {  
    "terms": {  
      "field":
      "cop.keyword"
    }
    },
      
  "Author_Aggregation" : {  
    "terms": {  
      "field":
      "author.keyword"}
    
  }, 
  
  "Format_Aggregation" : {  
    "terms": {  
      "field":
      "productform.keyword"}
    
  }, 
  
  "Status_Aggregation" : {  
    "terms": {  
      "field":
      "publicationstatus.keyword"}
    
  },
  "Readership_Aggregation" : {  
    "terms": {  
      "field":
      "audiences.keyword"}
    
  }
  
}        
  
}

dadoonet · October 16, 2021, 9:36am

And what is the full response from Elasticsearch?
Please share it in both cases. One with the aggs and one without any aggregation, but same query.

Srini12 · October 19, 2021, 12:29pm

Hi David
The response object is quite a huge object, it is over the allowed size here, I have uploaded at below location, can you please try get it from below, first one is with and other one without aggregations:

https://www.kwiksnoop.com/documents/elastic.json

thanks in advance.

dadoonet · October 19, 2021, 1:17pm

So. A first look at this gives me:

With aggregations:

{
  "took" : 15311,
  "hits" : {
    "total" : {
      "value" : 6,
      "relation" : "eq"
    }
...
}

Without aggregations:

  "took" : 21,

First of all, we can see that without aggs, the time is only 21ms. Not 10s.
Then, the time spent on the aggregation is 15s for only 6 documents. Which does not make sense at all.

Could you run the same agg again and again, and give the output (only the first lines until hits is enough) after some runs?
It it still slow?

Hardware is top notch

What kind of hardware do you have?

Srini12 · October 19, 2021, 1:40pm

Hi David,
Thank you

First of all, we can see that without aggs, the time is only 21ms. Not 10s.
Then, the time spent on the aggregation is 15s for only 6 documents. Which does not make sense at all.

I am not too worried about response time for without aggregations 10s was for almost 8000 record, for aggregations response time huge for large data sets.

Could you run the same agg again and again, and give the output (only the first lines until hits is enough) after some runs?
It it still slow?
there is very little improvement, if I run 3 times the response time is 13.5 seconds.


 "took" : 13597,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 6,
      "relation" : "eq"
    },

Below is the server configuration:

RAM : 64 GB
64 Bit, OS
16 Core Processor
2.80 GHz
Windows 10

dadoonet · October 19, 2021, 3:06pm

Could you add "profile": true when you run with the aggs?

What is the output of:

GET /
GET /_cat/nodes?v
GET /_cat/health?v
GET /_cat/indices?v

If some outputs are too big, please share them on gist.github.com and link them here.

Srini12 · October 25, 2021, 9:13am

Hi David,
Please see below picture for the quereis.

dadoonet · October 25, 2021, 11:15am

Please don't post images of text as they are hard to read, may not display correctly for everyone, and are not searchable.

Instead, paste the text and format it with </> icon or pairs of triple backticks (```), and check the preview window to make sure it's properly formatted before posting it. This makes it more likely that your question will receive a useful answer.

It would be great if you could update your post to solve this.

Also add the first query I asked for:

GET /

Thanks

Srini12 · October 28, 2021, 8:22am

Hi David
Please see below now, may be copy and paste the stats in any text editor to read properly.
I will send the GET / in next communication

result for : GET /_cat/nodes?v

ip                  heap.percent      ram.percent cpu load_1m load_5m load_15m node.role   master name
127.0.0.1           44                    85                                 65                          cdfhilmrstw *      <<server name>>

GET /_cat/health?v

epoch          timestamp     cluster               status       node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1635152186  08:56:26        elasticsearch       yellow          1               1           18     18    0    0        2             0                  -                 90.0%

GET /_cat/indices?v

health status index                               uuid     pri rep   docs.count  docs.deleted   store.size pri.store.size
green  open   .kibana_7.13.3_001      zTV0ZBDYR7OHohvEaxo_ng          1   0        345           39                         4.9mb          4.9mb
green  open   .apm-agent-configuration kQqZGYSQHaQN16hLlHQsQ         1   0          0            0                           208b           208b
green  open   .tasks                  2eEdiJmdTgmOEtKudXhHAQ         1   0        136            0                          81kb           81kb
green  open   .kibana_task_manager_7.13.3_001    4UjZ7VU7RQ2qP9YeFXHKGw         1   0         11         1005                         4.2mb          4.2mb
green  open   .security-7       TBOvtIRWRCipq97T1Oew3Q           1   0         55            0                         268.4kb        268.4kb
green  open   .apm-custom-link   6IkqyS-sTICtQt44VtI1Iw                1   0          0            0                           208b           208b
green  open   .kibana-event-log-7.13.3-000001    MoXe_5IARTqYnLcanYYgrg            1   0         85            0                           31kb           31kb
green  open   .kibana-event-log-7.13.3-000003   JJD5JlrPQfiMnqP_s9Sf2Q             1   0          5            0                           27.1kb         27.1kb
green  open   .kibana-event-log-7.13.3-000002  N5M6KzlHQkOQO1zJRJRd5g        1   0         25            0                           36.6kb         36.6kb
green  open   .kibana-event-log-7.13.3-000004    jeYUrCRNQLW4_wgzk5npkg         1   0         13            0                           18.6kb         18.6kb
yellow open   idx-autherindex       FK7YBB16SIOzMz_TxrvP_w          1   1          3            0                             70.6kb         70.6kb
green  open   .async-search   Ev7YbBA1SFWlwCAvM1Ij9w         1   0          0            0                           6.7kb          6.7kb
yellow open   idx-discoveryproduct  Gw5EN2OSSkCw0kEhiTeKMg         1   1   55990270     24907862                 96.2gb         96.2gb

Srini12 · October 29, 2021, 8:55am

Hi David,
Please find below as requested.

{
  "name" : "my-server-name",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "uTFhdzzdSDuSo7RB9NUg1g",
  "version" : {
    "number" : "7.13.3",
    "build_flavor" : "default",
    "build_type" : "zip",
    "build_hash" : "5d21bea28db1e89ecc1f66311ebdec9dc3aa7d64",
    "build_date" : "2021-07-02T12:06:10.804015202Z",
    "build_snapshot" : false,
    "lucene_version" : "8.8.2",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

dadoonet · October 29, 2021, 1:22pm

I'm guessing that you are searching in idx-discoveryproduct, right?

health status index                 pri rep   docs.count   docs.deleted   store.size     pri.store.size
yellow open   idx-discoveryproduct  1   1     55990270     24907862       96.2gb         96.2gb

So you have one single shard of 96gb. That's too much IMO.
Should have at least 2 shards, and may be more.
I'd try to split this index into 5 shards (20gb each more or less) to see if this is getting better.

You can try the Split API.

Also I can see that you have a lot of deletes. Are you doing a lot of updates?

Srini12 · October 29, 2021, 1:37pm

Yes, I am idx-discoveryproduct is the index i am working on. And we have lots of updates.
It is already in product index, what is the impact if I split it now into multiple shards?
Thanks

Topic		Replies	Views
Aggregations after upgrading to ES 7 - request slowed down Elasticsearch	5	451	April 27, 2020
Elasticsearch Aggregations taking a long time Elasticsearch	5	2396	July 5, 2017
Aggregations slow after inserts/updates Elasticsearch	11	632	August 21, 2019
Slow aggregation no matter the size of the result set Elasticsearch	3	483	October 26, 2018
Bad performance on aggregations Elasticsearch	5	480	July 6, 2017

Elastic Search Aggregations Slow

Related topics