Why are Range Queries on Nested Docs Slow?

Hello,

I was wondering if I could get some insight and possibly some explanation on some search performance I've noticed when using range queries with nested and non-nested documents.

I've been performing some benchmarking against ES 7.10 on a 3 data node cluster. I ran two types of queries on a 3 shard index that has 5 million documents each having 400 nested documents. The nested documents have a doc_id property, each document will contain the same set of nested document ids but with different values for nested_name. (So every document will have a nested doc with doc_id: 2, for instance, with a different value for nested_name)

My field mappings look something like this

Mappings
{
  "mappings": {
    "properties": {
      "id": {
        "type": "keyword"
      },
      "name": {
        "type": "keyword"
      },
      "some_nested_doc": {
        "type": "nested",
        "properties": {
          "id": {
            "type": "keyword"
          },
          "doc_id": {
            "type": "keyword"
          },
          "values": {
            "properties": {
              "nested_name": {
                "fields": {
                  "raw": {
                    "type": "keyword",
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

I ran two queries:

  1. A range query on the top level property name.
  2. A range query on the nested property nested_name.

Query 2 is part of a larger query, but I narrowed down the slowness of that query to this particular range query.

Query 1
{ 
  "query": {
    "bool": {
      "must": [
        {
          "range": {
            "name": {
              "gt": "<value>"
            }
          }
        }
      ]
    }
  }
}
Query 2
{
  "query": {
    "bool": {
      "should": [
        {
          "nested": {
            "path": "some_nested_doc",
            "query": {
              "bool": {
                "must": [
                  {
                    "filter": {
                      "term": {
                        "some_nested_doc.doc_id": {
                          "value": "<some_id>"
                        }
                      }
                    }
                  },
                  {
                    "range": {
                      "some_nested_doc.values.nested_name.raw": {
                        "gt": "<some_value>"
                      }
                    }
                  }
                ]
              }
            }
          }
        }
      ]
    }
  }
}

The search times differ quite a bit where Query 1's p90 is around 450ms and Query 2's is around 6 seconds. From looking at profiling traces, I noticed that the majority of the time is spent in the build_scorer. As I understand it, this creates an iterator over the matching document so that they can be scored. However, even running Query 2 within a filtered context yields the same performance results!

After searching a bit on possible optimizations, I realized I could put the doc_id term query inside a filter block. As I understand it, a filter should lower the number of matching docs before the should/must block in the bool block is executed. So in theory, the must block in Updated Query 2 should only be performing against 5 Mil documents (since each document should have a nested document with doc_id=x).

Updated Query 2
{
  "query": {
    "bool": {
      "should": [
        {
          "nested": {
            "path": "some_nested_doc",
            "query": {
              "bool": {
                "filter": {
                  "term": {
                    "some_nested_doc.doc_id": {
                      "value": "<some_id>"
                    }
                  }
                },
                "must": {
                  "range": {
                    "some_nested_doc.values.nested_name.raw": {
                      "gt": "<some_value>"
                    }
                  }
                }
              }
            }
          }
        }
      ]
    }
  }
}

Updated Query 2 performs exactly the same as Query 2 even though I would assume it performs the search over a smaller nested document set. Why does the performance of Updated Query 2 in this case not match Query 1? (don't they both technically apply a range query over 5 Million documents?) Is there something I'm missing around queries performed within a nested context? Are they inherently slower for some reason?

Thanks!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.