Non-nested query on copy_to fields performs worse then a nested query

Ron_lusha · April 25, 2021, 11:50am

Our service uses a lot of nested queries/data, in order to improve our response time I tried to rewrite our heaviest query and use copy_to to copy the fields I need from inside the nested docs to the root doc and query them, expecting a significant performance improvement

Our indexes look like this (simplified and only the relevant fields for the question)

{
  "mappings": {
    "properties": {
      "nested_field_a" : {
          "type" : "nested",
          "properties" : {
            "inner_field" : {
              "type" : "keyword",
              "copy_to" : [
                "_flat_inner_field_a"
              ]
            }
          }
        },
      "_flat_inner_field_a": {
        "type" : "keyword"
      },
      "nested_field_b" : {
          "type" : "nested",
          "properties" : {
            "inner_field" : {
              "type" : "keyword",
              "copy_to" : [
                "_flat_inner_field_b"
              ]
            }
          }
        },
      "_flat_inner_field_b": {
        "type" : "keyword"
      }
    }
  }
}

Our query queries 6 different indices at once, totaling ~11 billion docs, each doc with between 1-5 (usually only 1), nested docs across the two nested fields (some only have docs in one of the nested fields)

The query tries to find all docs that contain at least one of a bunch of terms, it is guaranteed to find at least 1 doc for each term (can reach tens of docs per term as well)

Our existing query looks like this

"query": {
    "bool": {
      "should": [
        {
          "nested": {
            "path": "nested_field_a",
            "query": {
              "terms": {
                "nested_field_a.inner_field": [
                  1,
                  2
                ]
              }
            }
          }
        },
        {
          "nested": {
            "path": "nested_field_b",
            "query": {
              "terms": {
                "nested_field_b.inner_field": [
                  "a",
                  "c"
                ]
              }
            }
          }
        }
      ]
    }
  }

(1,2, a,c are just examples, the lists contain unique values that change with each request, with 1-20 items each)

And the new one like this:

"query": {
    "bool": {
      "should": [
        {
          "terms": {
            "_flat_inner_field_a": [
              1,
              2
            ]
          }
        },
        {
          "terms": {
            "_flat_inner_field_b": [
              "a",
              "c"
            ]
          }
        }
      ]
    }
  }

With the new query, not only do I not see the speed improvement I expected, the query performs up to 40% slower!

Profile for the old query (full profile) :

"type" : "BooleanQuery",
"description" : "ToParentBlockJoinQuery (ConstantScore(nested_field_a.inner_field:1 nested_field_a.inner_field:2)) ToParentBlockJoinQuery (ConstantScore(nested_field_b.inner_field:a nested_field_b.inner_field:c))",
"time" : "697.5micros"

Profile for the new query (full profile) :

"type" : "BooleanQuery",
"description" : "+(ConstantScore(_flat_inner_field_a:1 _flat_inner_field_a:2) ConstantScore(_flat_inner_field_b:a _flat_inner_field_b:c)) #DocValuesFieldExistsQuery [field=_primary_term]",
"time" : "1ms"

The difference I see is that the new query no longer performs the two joins, but adds a DocValuesFieldExistsQuery for some reason

Our production env runs on elastic 7.3, the elastic env I used for the examples above is 7.8
The example above was performed on a local elastic on my pc, a single index with this doc in it

{
  "nested_field_a": [{"inner_field": 1}, {"inner_field": 2}, {"inner_field": 3}],
  "nested_field_b": [{"inner_field": "a"}, {"inner_field": "c"}, {"inner_field": "k"}]
}

I'd appreciate your help with this, thanks

system · May 23, 2021, 11:51am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Copy_to and nested field type Elasticsearch	1	1721	May 19, 2017
Why are Range Queries on Nested Docs Slow? Elasticsearch	1	757	May 28, 2021
Would making a document non-nested improve query performance? Elasticsearch	1	494	December 6, 2018
Nested Fields in an Index Elasticsearch	8	115	December 11, 2024
Adding copy_to support for copying data from object to nested type Elasticsearch	1	1589	July 17, 2018

Non-nested query on copy_to fields performs worse then a nested query

Related topics