Performance degradation of Significant Terms Aggregation after upgrade (v2.4 -> v5.4)

mats116 · June 21, 2017, 10:23am

We tried to migrate from 2.4 to 5.4, but we noticed quite significant performance degradation.

Significantly decreases especially in Significant Terms Aggregation.

We consider the cange about collect_mode is involved, but are there other considerations?

Sample Query

{
  "query": {
    "query_string": {
      "query": "some_ids:259352",
      "default_operator": "AND"
    }
  },
  "size": 10,
  "aggs": {
    "org_cat": {
      "significant_terms": {
        "field": "org_cat",
        "shard_size": 300,
        "min_doc_count": 10,
        "gnd": {
          "background_is_superset": false
        },
        "size": 100
      }
    },
    "keyword": {
      "significant_terms": {
        "field": "keyword",
        "shard_size": 300,
        "min_doc_count": 10,
        "gnd": {
          "background_is_superset": false
        },
        "size": 100
      }
    },
    "domain": {
      "significant_terms": {
        "field": "domain",
        "shard_size": 300,
        "min_doc_count": 10,
        "gnd": {
          "background_is_superset": false
        },
        "size": 100
      }
    },
    "ua_name": {
      "significant_terms": {
        "field": "ua_name",
        "shard_size": 300,
        "min_doc_count": 10,
        "gnd": {
          "background_is_superset": false
        },
        "size": 100
      }
    },
    "ip": {
      "significant_terms": {
        "field": "ip_addr",
        "shard_size": 300,
        "min_doc_count": 10,
        "gnd": {
          "background_is_superset": false
        },
        "size": 100
      }
    },
    "org_name": {
      "significant_terms": {
        "field": "org_name",
        "shard_size": 300,
        "min_doc_count": 10,
        "gnd": {
          "background_is_superset": false
        },
        "size": 100
      }
    },
    "customer_ids": {
      "significant_terms": {
        "field": "customer_ids",
        "shard_size": 300,
        "min_doc_count": 10,
        "gnd": {
          "background_is_superset": false
        },
        "size": 100
      }
    },
    "pref_code": {
      "significant_terms": {
        "field": "pref_code",
        "shard_size": 300,
        "min_doc_count": 10,
        "gnd": {
          "background_is_superset": false
        },
        "size": 100
      }
    },
    "org_emp_code": {
      "significant_terms": {
        "field": "org_emp_code",
        "shard_size": 300,
        "min_doc_count": 10,
        "gnd": {
          "background_is_superset": false
        },
        "size": 100
      }
    },
    "ua_os": {
      "significant_terms": {
        "field": "ua_os",
        "shard_size": 300,
        "min_doc_count": 10,
        "gnd": {
          "background_is_superset": false
        },
        "size": 100
      }
    },
    "org_gross_code": {
      "significant_terms": {
        "field": "org_gross_code",
        "shard_size": 300,
        "min_doc_count": 10,
        "gnd": {
          "background_is_superset": false
        },
        "size": 100
      }
    },
    "some_ids": {
      "significant_terms": {
        "field": "segment_ids",
        "shard_size": 300,
        "min_doc_count": 10,
        "gnd": {
          "background_is_superset": false
        },
        "size": 100
      }
    }
  }
}

2.4 Response - 16s

{
    "took": 16564,
    "timed_out": false,
    "_shards": {
        "total": 480,
        "successful": 480,
        "failed": 0
    },
    "hits": {
        "total": 2965312,
        "max_score": 5.930258,
        "hits": []
    },
    "aggregations": {
        "ua_name": {},
        "org_name.raw": {},
        "segment_ids": {},
        "org_gross_code": {},
        "domain": {},
        ....
    }
}

5.4 Response - 1.5m

{
    "took": 91375,
    "timed_out": false,
    "_shards": {
        "total": 480,
        "successful": 480,
        "failed": 0
    },
    "hits": {
        "total": 2948700,
        "max_score": 1,
        "hits": []
    },
    "aggregations": {
        "ua_name": {},
        "org_name.raw": {},
        "segment_ids": {},
        "org_gross_code": {},
        "domain": {},
        ....
    }
}

Other info

Total docs: 390,000,000
Total shards: 480
AWS
8 core x 60 nodes
docker.elastic.co/elasticsearch/elasticsearch:5.4.2

Mark_Harwood · June 21, 2017, 11:15am

Try narrow down response times for the individual fields to find the culprit.

My expectation is that it is the fields of type ip that are the problem. In more recent versions of Lucene the IP field does not hold a count we can directly look up for background doc frequency (DF). The work-around used internally to find the frequency for IP values was to effectively run a query that examines the postings list for ip X and count the number of docs, similarly to how we look up DFs if you apply a custom background_filter. This is obviously slower.

The workaround for you would be to also index ip fields as type keyword which would retain the ability to do fast lookups for DF e.g.

      "remote_host_ip" : {
        "type" : "ip",
        "fields" : {
          "asKeyword" : {
            "type" : "keyword"
          }
        }
      },

.. then do significant_terms agg in the remote_host.asKeyword field

mats116 · June 21, 2017, 5:55pm

Thanks Mark !

Your advice was effective.
Does this effect also apply to numeric(integer, long...) ?

And I feel that a big shard_size causes a crash (because GC), but on 2.4.5 it is not caused.
Is this a specification?

shard_size < 1000 : 5.4 is Faster
shard_size < 20000 : 2.4 is Faster
shard_size > 20000 : 5.4 is crashed (Heavy GC)

Query

{
    "query": {
        "query_string": {
            "query": "some_ids:10000",
            "default_operator": "AND"
        }
    },
    "aggs": {
        "keywords.raw": {
            "significant_terms": {
                "field": "keywords.raw",
                "shard_size": 100000,
                "min_doc_count": 10,
                "gnd": {
                    "background_is_superset": false
                },
                "size": 100000
            }
        }
    },
    "size": 0
}

Mark_Harwood · June 21, 2017, 8:23pm

Unfortunately, yes. The related code to fix the frequency lookups was in Make significant terms work on fields that are indexed with points. by jpountz · Pull Request #18031 · elastic/elasticsearch · GitHub

By treating the numbers/ips as 'keyword' fields the internal agg working state required to hold large sets of match candidates may also be less efficient in terms of ram than previous versions.

mats116 · June 22, 2017, 1:36am

Is it caused only by treating the numbers/ips as 'keyword' fields ?

I think that
about 'keyword' fields (not_analyzed string in 2.4), Significant Terms Aggregation may also be less efficient in terms of ram than previous versions.
Isn't it ?

Mark_Harwood · June 22, 2017, 8:27pm

One other thing to look at - the various settings for 'execution_hint' have a part to play in holding of interim state.

mats116 · June 23, 2017, 9:10am

Thanks Mark.
I will check it !!

Let's go drink someday : )

Mark_Harwood · June 23, 2017, 9:11am

Always good to hear what people are using significant terms to find

system · July 21, 2017, 9:11am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Significant terms aggregation too slow for me Elasticsearch	2	506	July 6, 2017
Significant_terms weak performance and inconsistent results Elasticsearch	3	601	February 26, 2018
Significant terms, ES5, and memory issues Elasticsearch	7	788	December 16, 2016
Slow terms aggregation speed on ~130M documents Elasticsearch	34	7961	May 10, 2019
Significant terms aggregations results dependent on size request parameter? Elasticsearch	2	442	July 5, 2017

Performance degradation of Significant Terms Aggregation after upgrade (v2.4 -> v5.4)

Sample Query

2.4 Response - 16s

5.4 Response - 1.5m

Other info

Related topics