Diagnosing the performance bottleneck in search


(Rodrigo Rezende) #1

Hi,

I'm intrigued by the following performance and I'm looking for the root cause:

query

GET /dataset/_search
{
   "size": 0,
   "profile": true
}

returns in 26s: (there are only 5 shards but less than 1M documents).
my expectation is to run it in subsecond. Any idea of what's going on?

{
   "took": 26280,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 915054,
      "max_score": 0,
      "hits": []
   },
   "profile": {
      "shards": [
         {
            "id": "[7_mfMbzNRpyTy_QI5H8HbQ][dataset][1]",
            "searches": [
               {
                  "query": [
                     {
                        "query_type": "MatchAllDocsQuery",
                        "lucene": "*:*",
                        "time": "33.19220000ms",
                        "breakdown": {
                           "match": 0,
                           "score": 0,
                           "create_weight": 8700,
                           "build_scorer": 168100,
                           "next_doc": 33015400,
                           "advance": 0
                        }
                     }
                  ],
                  "rewrite_time": 28900,
                  "collector": [
                     {
                        "name": "TotalHitCountCollector",
                        "reason": "search_count",
                        "time": "22.24710000ms"
                     }
                  ]
               }
            ]
         },
         {
            "id": "[j9Vdg5yTTBmnxzDhk9MzYg][dataset][4]",
            "searches": [
               {
                  "query": [
                     {
                        "query_type": "MatchAllDocsQuery",
                        "lucene": "*:*",
                        "time": "29.22860000ms",
                        "breakdown": {
                           "match": 0,
                           "score": 0,
                           "create_weight": 2000,
                           "build_scorer": 42500,
                           "next_doc": 29184100,
                           "advance": 0
                        }
                     }
                  ],
                  "rewrite_time": 3100,
                  "collector": [
                     {
                        "name": "TotalHitCountCollector",
                        "reason": "search_count",
                        "time": "40.85320000ms"
                     }
                  ]
               }
            ]
         },
         {
            "id": "[j9Vdg5yTTBmnxzDhk9MzYg][dataset][0]",
            "searches": [
               {
                  "query": [
                     {
                        "query_type": "MatchAllDocsQuery",
                        "lucene": "*:*",
                        "time": "36.92300000ms",
                        "breakdown": {
                           "match": 0,
                           "score": 0,
                           "create_weight": 1600,
                           "build_scorer": 52600,
                           "next_doc": 36868800,
                           "advance": 0
                        }
                     }
                  ],
                  "rewrite_time": 11100,
                  "collector": [
                     {
                        "name": "TotalHitCountCollector",
                        "reason": "search_count",
                        "time": "23.46090000ms"
                     }
                  ]
               }
            ]
         },
         {
            "id": "[xNTreWgMRyiiKQ5mwAiKMg][dataset][2]",
            "searches": [
               {
                  "query": [
                     {
                        "query_type": "MatchAllDocsQuery",
                        "lucene": "*:*",
                        "time": "24.18800000ms",
                        "breakdown": {
                           "match": 0,
                           "score": 0,
                           "create_weight": 1900,
                           "build_scorer": 36500,
                           "next_doc": 24149600,
                           "advance": 0
                        }
                     }
                  ],
                  "rewrite_time": 19800,
                  "collector": [
                     {
                        "name": "TotalHitCountCollector",
                        "reason": "search_count",
                        "time": "150.1178000ms"
                     }
                  ]
               }
            ]
         },
         {
            "id": "[znT302KKSceuIcndABCnqA][dataset][3]",
            "searches": [
               {
                  "query": [
                     {
                        "query_type": "MatchAllDocsQuery",
                        "lucene": "*:*",
                        "time": "32.66030800ms",
                        "breakdown": {
                           "match": 0,
                           "score": 0,
                           "create_weight": 2300,
                           "build_scorer": 55499,
                           "next_doc": 32602509,
                           "advance": 0
                        }
                     }
                  ],
                  "rewrite_time": 10599,
                  "collector": [
                     {
                        "name": "TotalHitCountCollector",
                        "reason": "search_count",
                        "time": "24.11841600ms"
                     }
                  ]
               }
            ]
         }
      ]
   }
}

(Rodrigo Rezende) #2

We solved the issue.
If you notice, took time is ~ 26 seconds but all the parts are executed in milliseconds. That's certainly is an inconsistency in the profile response.
The gap was caused by Garbage Collection churning because of this bug: https://github.com/elastic/elasticsearch/pull/20620/files

That also proves the profile API is missing important items to report.


(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.