Search_after queries are not performing as expected

Hi all!

I'm working on building a searcher for a site. This site generates a lot of writes (new or modified documents) so we need to use some mechanism to ensure pagination consistency by using certain criteria (freshness or price in our cases).

We began to use the search_after parameter for solving our issues. This works fine until we realized that our query slow log started to have a lot of entries. At some moments this lead us to high CPU usage and high search latency, which degraded our service.

After a deeper look, we realized that all the slow queries look like the following one:

{
  "size": 30,
  "sort": [
    {
      "sortDate": {
        "order": "desc"
      },
      "id": {
        "order": "desc"
      }
    }
  ],
  "search_after": [
    1636109510000,
    {any_id}
  ],
  "track_total_hits": 10000
}

All of these log entries show us queries without filters or search terms and use the search_after parameter. This kind of query is taking ~400ms to be resolved. The issue is that its equivalent using a simple from/size pagination is taking just ~20ms.

So, why ES is taking more time/resources to resolve the query with "search_after" than its equivalent with from/size?

We realized that this issue is only happening when searching without filters (our index has ~15M of documents), by filtering for some criteria that narrow down the dataset this issue disappears.

Following a suggestion, we discovered index sorting. After applying it, we reduce search latency dramatically for both queries (with or without search_after), but just for one of our sorting criteria.

So, our questions are:

Why some queries using search_after are performing so bad compared with their from/size equivalents?
If the solution is to help index by adding index sorting, what can we do to improve query times for different search criteria?

Some examples of the queries before applying the index sorting (forget that localhost, I'm just using a proxy for reaching the production cluster):


What Elasticsearch version are you using?

Can you please post an output of Profile API for this slow sort query with search_after and also an equivalent from/size query that is much faster?

1 Like

Hi Mayya! Thanks for answering,

We are using 7.10.

Profile of a from/size query:

{
  "took": 56,
  "timed_out": false,
  "_shards": {
    "total": 2,
    "successful": 2,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 10000,
      "relation": "gte"
    },
    "max_score": null,
    "hits": []
  },
  "profile": {
    "shards": [
      {
        "id": "[2ZdWfwd_T6ubj-jNQsivNA][ads-202111121136][0]",
        "searches": [
          {
            "query": [
              {
                "type": "BooleanQuery",
                "description": "#*:* LongDistanceFeatureQuery(field=,origin=999999999,pivotDistance=499999999)",
                "time_in_nanos": 20188168,
                "breakdown": {
                  "set_min_competitive_score_count": 257,
                  "match_count": 0,
                  "shallow_advance_count": 0,
                  "set_min_competitive_score": 12819400,
                  "next_doc": 3998602,
                  "match": 0,
                  "next_doc_count": 5543,
                  "score_count": 1425,
                  "compute_max_score_count": 0,
                  "compute_max_score": 0,
                  "advance": 334259,
                  "advance_count": 27,
                  "score": 1192773,
                  "build_scorer_count": 54,
                  "create_weight": 363068,
                  "shallow_advance": 0,
                  "create_weight_count": 1,
                  "build_scorer": 1480066
                },
                "children": [
                  {
                    "type": "MatchAllDocsQuery",
                    "description": "*:*",
                    "time_in_nanos": 1028258,
                    "breakdown": {
                      "set_min_competitive_score_count": 0,
                      "match_count": 0,
                      "shallow_advance_count": 0,
                      "set_min_competitive_score": 0,
                      "next_doc": 0,
                      "match": 0,
                      "next_doc_count": 0,
                      "score_count": 0,
                      "compute_max_score_count": 0,
                      "compute_max_score": 0,
                      "advance": 936037,
                      "advance_count": 9087,
                      "score": 0,
                      "build_scorer_count": 81,
                      "create_weight": 4463,
                      "shallow_advance": 0,
                      "create_weight_count": 1,
                      "build_scorer": 87758
                    }
                  },
                  {
                    "type": "LongDistanceFeatureQuery",
                    "description": "LongDistanceFeatureQuery(field=,origin=999999999,pivotDistance=499999999)",
                    "time_in_nanos": 13833607,
                    "breakdown": {
                      "set_min_competitive_score_count": 257,
                      "match_count": 0,
                      "shallow_advance_count": 81,
                      "set_min_competitive_score": 12529039,
                      "next_doc": 0,
                      "match": 0,
                      "next_doc_count": 0,
                      "score_count": 1425,
                      "compute_max_score_count": 54,
                      "compute_max_score": 7509,
                      "advance": 676901,
                      "advance_count": 5395,
                      "score": 501172,
                      "build_scorer_count": 54,
                      "create_weight": 1084,
                      "shallow_advance": 11638,
                      "create_weight_count": 1,
                      "build_scorer": 106264
                    }
                  }
                ]
              }
            ],
            "rewrite_time": 36734,
            "collector": [
              {
                "name": "SimpleFieldCollector",
                "reason": "search_top_hits",
                "time_in_nanos": 15452992
              }
            ]
          }
        ],
        "aggregations": []
      },
      {
        "id": "[W5AN-Ap6R9iy0R9OjMuHUg][ads-202111121136][1]",
        "searches": [
          {
            "query": [
              {
                "type": "BooleanQuery",
                "description": "#*:* LongDistanceFeatureQuery(field=,origin=999999999,pivotDistance=499999999)",
                "time_in_nanos": 21845509,
                "breakdown": {
                  "set_min_competitive_score_count": 289,
                  "match_count": 0,
                  "shallow_advance_count": 0,
                  "set_min_competitive_score": 14510762,
                  "next_doc": 3127208,
                  "match": 0,
                  "next_doc_count": 6003,
                  "score_count": 3342,
                  "compute_max_score_count": 0,
                  "compute_max_score": 0,
                  "advance": 225705,
                  "advance_count": 23,
                  "score": 1783918,
                  "build_scorer_count": 46,
                  "create_weight": 341011,
                  "shallow_advance": 0,
                  "create_weight_count": 1,
                  "build_scorer": 1856905
                },
                "children": [
                  {
                    "type": "MatchAllDocsQuery",
                    "description": "*:*",
                    "time_in_nanos": 747234,
                    "breakdown": {
                      "set_min_competitive_score_count": 0,
                      "match_count": 0,
                      "shallow_advance_count": 0,
                      "set_min_competitive_score": 0,
                      "next_doc": 0,
                      "match": 0,
                      "next_doc_count": 0,
                      "score_count": 0,
                      "compute_max_score_count": 0,
                      "compute_max_score": 0,
                      "advance": 651840,
                      "advance_count": 9324,
                      "score": 0,
                      "build_scorer_count": 69,
                      "create_weight": 4266,
                      "shallow_advance": 0,
                      "create_weight_count": 1,
                      "build_scorer": 91128
                    }
                  },
                  {
                    "type": "LongDistanceFeatureQuery",
                    "description": "LongDistanceFeatureQuery(field=,origin=999999999,pivotDistance=499999999)",
                    "time_in_nanos": 15461591,
                    "breakdown": {
                      "set_min_competitive_score_count": 289,
                      "match_count": 0,
                      "shallow_advance_count": 69,
                      "set_min_competitive_score": 13920752,
                      "next_doc": 0,
                      "match": 0,
                      "next_doc_count": 0,
                      "score_count": 3342,
                      "compute_max_score_count": 46,
                      "compute_max_score": 7272,
                      "advance": 673629,
                      "advance_count": 5710,
                      "score": 750093,
                      "build_scorer_count": 46,
                      "create_weight": 1034,
                      "shallow_advance": 11370,
                      "create_weight_count": 1,
                      "build_scorer": 97441
                    }
                  }
                ]
              }
            ],
            "rewrite_time": 34905,
            "collector": [
              {
                "name": "SimpleFieldCollector",
                "reason": "search_top_hits",
                "time_in_nanos": 19567276
              }
            ]
          }
        ],
        "aggregations": []
      }
    ]
  }
}

Profile of the equivalent query using search_after:

{
  "took": 977,
  "timed_out": false,
  "_shards": {
    "total": 2,
    "successful": 2,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 10000,
      "relation": "gte"
    },
    "max_score": null,
    "hits": []
  },
  "profile": {
    "shards": [
      {
        "id": "[OXyAsvfDQqm5f6kWJze12g][ads-202111121136][0]",
        "searches": [
          {
            "query": [
              {
                "type": "MatchAllDocsQuery",
                "description": "*:*",
                "time_in_nanos": 335185552,
                "breakdown": {
                  "set_min_competitive_score_count": 0,
                  "match_count": 0,
                  "shallow_advance_count": 0,
                  "set_min_competitive_score": 0,
                  "next_doc": 334529838,
                  "match": 0,
                  "next_doc_count": 9621019,
                  "score_count": 0,
                  "compute_max_score_count": 0,
                  "compute_max_score": 0,
                  "advance": 9846,
                  "advance_count": 23,
                  "score": 0,
                  "build_scorer_count": 46,
                  "create_weight": 8788,
                  "shallow_advance": 0,
                  "create_weight_count": 1,
                  "build_scorer": 637080
                }
              }
            ],
            "rewrite_time": 21956,
            "collector": [
              {
                "name": "PagingFieldCollector",
                "reason": "search_top_hits",
                "time_in_nanos": 531797539
              }
            ]
          }
        ],
        "aggregations": []
      },
      {
        "id": "[OXyAsvfDQqm5f6kWJze12g][ads-202111121136][1]",
        "searches": [
          {
            "query": [
              {
                "type": "MatchAllDocsQuery",
                "description": "*:*",
                "time_in_nanos": 334356408,
                "breakdown": {
                  "set_min_competitive_score_count": 0,
                  "match_count": 0,
                  "shallow_advance_count": 0,
                  "set_min_competitive_score": 0,
                  "next_doc": 333719756,
                  "match": 0,
                  "next_doc_count": 9629720,
                  "score_count": 0,
                  "compute_max_score_count": 0,
                  "compute_max_score": 0,
                  "advance": 9127,
                  "advance_count": 25,
                  "score": 0,
                  "build_scorer_count": 50,
                  "create_weight": 8304,
                  "shallow_advance": 0,
                  "create_weight_count": 1,
                  "build_scorer": 619221
                }
              }
            ],
            "rewrite_time": 20651,
            "collector": [
              {
                "name": "PagingFieldCollector",
                "reason": "search_top_hits",
                "time_in_nanos": 531982666
              }
            ]
          }
        ],
        "aggregations": []
      }
    ]
  }
}

For both responses, I have intentionally removed the hits section.

I see what's happening.

From 7.6 we introduced an optimization for sort queries, we rewrite a sort query as LongDistanceFeatureQuery that can skip over blocks of documents. This makes sort queries under certain conditions much faster. The optimization is currently not applied to search_after case. From 7.16 it will be applied to search_after as well, so sort queries with search_after also will be much faster under certain conditions.

The problem with from/size even though it is faster now, is extra pressure on memory when from number is big. When you from=30 and size=30 we in fact request 60 documents from every shard, and only the coordinating node after merging sorted docs from all shards can return the next 30 docs from the merged list.

Yeah! that makes sense since we realized that this gap between both kinds of queries is smaller as we force deeper pagination.

So, in the meanwhile, is there something that we can do to improve these queries apart from using index sorting (which fixed our issues but just for one of the sorting criteria)?

No, there is nothing you can do to improve the performance of queries with search_after except waiting for 7.16 which should be out soon.

Nice, sounds good. Thanks for the help! :slight_smile:

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.