Performance impact of the "include/exclude" fields of an aggregation

deviantony · December 11, 2015, 10:40am

Hello there, I'd love to know what are the performance impacts of using these fields in an aggregation on a big dataset?

As it implies regexp, I'm not really sure of how it would perform against a lot of data.

tanguy · December 11, 2015, 10:48am

Hi,

Can you please give us an example of what you'd like to do?

Mark_Harwood · December 11, 2015, 11:05am

A single string is interpreted as a regex pattern which can be slow but alternatively an array of values can be passed and these are taken as exact-value filters which are looked up in a hashmap for docs matching your query.

deviantony · December 11, 2015, 12:38pm

Using the following entities:

{
  "label": "Galaxy S4",
  "categoryPath": ["Smartphone/Android/5.1"]
}

{
  "label": "Galaxy S6",
  "categoryPath": ["Smartphone/Android/6.0"]
}

{
  "label": "Iphone 6s",
  "categoryPath": ["Smartphone/IOS"]
}

And the category tree for this example:

| /
| / Smartphone
| / Smartphone / Android
| / Smartphone / Android / 5.1
| / Smartphone / Android / 6.0
| / Smartphone / IOS

What I would like to do is retrieving the number of product per category level, e.g: how many products are located in the "Smartphone" category? And I expect it to return two buckets for the children categories only (Android and IOS).

I'm currently using the following query to retrieve how many products are located in the "Smartphone" category with:

GET my_index/product/_search
{
  "query": {
    "filtered": {
      "query": {
        "match_all": {}
      },
      "filter": {
        "term": {
          "categoryPath.tokenized": "/Smartphone"
        }
      }
    }
  },
  "aggs": {
    "category": {
      "terms": {
        "field": "categoryPath.tokenized",
        "size": 0,
        "include": "\/Smartphone\/.*",
        "exclude": "\/Smartphone\/.*\/.*"
      }
    }
  }
}

Mapping used:

{
  "settings": {
    "analysis": {
      "analyzer": {
        "path_analyzer": {
          "tokenizer": "path_hierarchy"
        }
      }
    }
  },
  "mappings": {
    "product": {
      "properties": {
        "label": {
          "type": "string",
          "analyzer": "english"
        },
        "categoryPath": {
          "type": "string",
          "index": "not_analyzed",
          "doc_values": true,
          "fields": {
            "tokenized": {
              "type": "string",
              "analyzer": "path_analyzer"
            }
          }
        }
      }
    }
  }
}

Based on the topic: Aggregation on a materialized path

Topic		Replies	Views
Elastic field include exclude performance Elasticsearch	2	940	August 19, 2020
Can't make include/exclude patterns on aggregation work Elasticsearch	0	838	March 4, 2016
Help on optimization of aggregation request with fieldData set to true Elasticsearch	9	616	August 20, 2018
Aggregation on a materialized path Elasticsearch	2	3660	December 8, 2015
Terms aggregation and regex filter Elasticsearch	0	2497	March 21, 2015

Performance impact of the "include/exclude" fields of an aggregation

Related topics