Very slow aggregate query with 15 M of records

DevYSM · August 11, 2024, 12:27pm

Hello, I have an index that has more than 15M records of website logs but while I'm trying to do an aggregation to sum the total of some fields the request takes a long time more than 10 seconds and after that gives me an error

{
  "statusCode": 502,
  "error": "Bad Gateway",
  "message": "socket hang up"

And the following query is:

{
  "size": 0,
  "query": {
    "bool": {
      "must": [
        {
          "range": {
            "day": {
              "gte": "2024-04-03",
              "lte": "2024-04-04",
              "format": "yyyy-MM-dd"
            }
          }
        },
        {
          "term": {
            "country_alias": "kw"
          }
        }
      ]
    }
  },
  "aggs": {
    "entities": {
      "terms": {
        "field": "entity_title",
        "size": 100000
      },
      "aggs": {
        "records": {
          "top_hits": {
            "size": 1,
            "_source": {
              "includes": [
                "entity_id",
                "entity_type",
                "entity_title",
                "country_alias",
                "main_taxonomy",
                "main_taxonomy_title",
                "owner_phone",
                "created_at",
                "updated_at"
              ]
            }
          }
        },
        "total_visits": {
          "sum": {
            "script": {
              "lang": "painless",
              "source": "doc['visit_ios_count'].value + doc['visit_android_count'].value + doc['visit_huawei_count'].value + doc['visit_web_count'].value"
            }
          }
        },
        "total_calls": {
          "sum": {
            "script": {
              "lang": "painless",
              "source": "doc['call_ios_count'].value + doc['call_android_count'].value + doc['call_huawei_count'].value + doc['call_web_count'].value"
            }
          }
        },
        "total_whatsapp": {
          "sum": {
            "script": {
              "lang": "painless",
              "source": "doc['whatsapp_ios_count'].value + doc['whatsapp_android_count'].value + doc['whatsapp_huawei_count'].value + doc['whatsapp_web_count'].value"
            }
          }
        },
        "total_chat": {
          "sum": {
            "script": {
              "lang": "painless",
              "source": "doc['chat_ios_count'].value + doc['chat_android_count'].value + doc['chat_huawei_count'].value + doc['chat_web_count'].value"
            }
          }
        }
      }
    }
  }

Can any one help me to fix this issue?

DevYSM · August 11, 2024, 12:40pm

@dadoonet Can you help me?

Christian_Dahlqvist · August 11, 2024, 5:07pm

This a community forum where everyone volunteers so it is considered rude to ping people not already involved in the convesation. Please also be patient as it can take time to get questions answered. The more specific they are the more time it may take as not everyone may know the anser or that area. If you have not received any response after 2 or 3 business days it is usually fine to bump the thread.

When asking a question it is always useful to specify which version of Elasticsearch you are using and some details about the size and configuration of your cluster.

I have never seen the error you posted. Do you have a proxy in between the client and Elasticsearch that could cause the error?

How many shards is the data spread across? What is the size of these?

When your query runs and is slow, do you notice high CPU usage?

Given that you use scripting to sum up different fields within the document, have you tested instead adding these to the document so you do not have to use scripts?

DevYSM · August 12, 2024, 6:59am

I'm so sorry for the mistake and all respect for our community.

I have never seen the error you posted. Do you have a proxy in between the client and Elasticsearch that could cause the error?

I have asked the DevOps and he's said it's added Cloudflare

I'm using the latest version of Elasticsearch 8.15.

This is the shards query result.

classified-log-report-per-day 0 p STARTED    54000 28.4mb 28.4mb 172.18.0.2 Elasticsearch
classified-log-report-per-day 0 r UNASSIGNED

Finally, I'm using the sum script because this data is synced from MongoDB.

=========================

I'll share the final result example, In this example, I'm trying to group the totals for each entity_title to get the totals in range date for example in 1 month.

{
  "took": 42,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 10000,
      "relation": "gte"
    },
    "max_score": null,
    "hits": []
  },
  "aggregations": {
    "entities": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 27344,
      "buckets": [
        {
          "key": "ابحث عن عمل ",
          "doc_count": 21,
          "records": {
            "hits": {
              "total": {
                "value": 21,
                "relation": "eq"
              },
              "max_score": 1.6797194,
              "hits": [
                {
                  "_index": "classified-log-report-per-day",
                  "_id": "66a9ad4f72912a8f4a00cf71",
                  "_score": 1.6797194,
                  "_source": {
                    "entity_title": "ابحث عن عمل ",
                    "created_at": "2024-07-31T03:19:43.072000",
                    "updated_at": "2024-07-31T03:19:43.072000",
                    "country_alias": "kw",
                    "main_taxonomy": 195,
                    "owner_phone": "+96599615720",
                    "entity_id": 10464740,
                    "entity_type": "post",
                    "main_taxonomy_title": {
                      "ar": "وظائف / باحثون عن عمل",
                      "en": "Jobs"
                    }
                  }
                }
              ]
            }
          },
          "total_visits": {
            "value": 53
          },
          "total_calls": {
            "value": 0
          },
          "total_whatsapp": {
            "value": 3
          },
          "total_chat": {
            "value": 0
          }
        }
      ]
    }
  }
}```

Christian_Dahlqvist · August 12, 2024, 7:11am

That then probably explains the error seen.

It looks like you have a single node and as the index is very small it is odd that the query even if it is using scripts is taking that long.

It would help if you could provide details about the amount of resources available to this node, e.g. CPU cores, RAM, heap size and the type of storage used.

It would also help if you could monitor CPU usage while the query is running.

DevYSM · August 12, 2024, 7:25am

Can you check the image above?

Christian_Dahlqvist · August 12, 2024, 8:21am

That does not really answer my question about provisioned resources. Was it taken when the query was running?

Topic		Replies	Views
Very slow aggregate query Elasticsearch	16	1642	November 12, 2021
Aggregation Sum is very slow Elasticsearch	1	537	October 9, 2018
Elastic Search Aggregations Slow Elasticsearch	21	2621	November 26, 2021
Slow searches on a cluster Elasticsearch	3	877	July 5, 2017
Elasticsearch Aggregations taking a long time Elasticsearch	5	2385	July 5, 2017

Very slow aggregate query with 15 M of records

Related topics