Performance impact of the "include/exclude" fields of an aggregation


(Deviantony) #1

Hello there, I'd love to know what are the performance impacts of using these fields in an aggregation on a big dataset?

As it implies regexp, I'm not really sure of how it would perform against a lot of data.


(Tanguy) #2

Hi,

Can you please give us an example of what you'd like to do?


(Mark Harwood) #3

A single string is interpreted as a regex pattern which can be slow but alternatively an array of values can be passed and these are taken as exact-value filters which are looked up in a hashmap for docs matching your query.


(Deviantony) #4

Using the following entities:

{
  "label": "Galaxy S4",
  "categoryPath": ["Smartphone/Android/5.1"]
}

{
  "label": "Galaxy S6",
  "categoryPath": ["Smartphone/Android/6.0"]
}

{
  "label": "Iphone 6s",
  "categoryPath": ["Smartphone/IOS"]
}

And the category tree for this example:

| /
| / Smartphone
| / Smartphone / Android
| / Smartphone / Android / 5.1
| / Smartphone / Android / 6.0
| / Smartphone / IOS

What I would like to do is retrieving the number of product per category level, e.g: how many products are located in the "Smartphone" category? And I expect it to return two buckets for the children categories only (Android and IOS).

I'm currently using the following query to retrieve how many products are located in the "Smartphone" category with:

GET my_index/product/_search
{
  "query": {
    "filtered": {
      "query": {
        "match_all": {}
      },
      "filter": {
        "term": {
          "categoryPath.tokenized": "/Smartphone"
        }
      }
    }
  },
  "aggs": {
    "category": {
      "terms": {
        "field": "categoryPath.tokenized",
        "size": 0,
        "include": "\/Smartphone\/.*",
        "exclude": "\/Smartphone\/.*\/.*"
      }
    }
  }
}

Mapping used:

{
  "settings": {
    "analysis": {
      "analyzer": {
        "path_analyzer": {
          "tokenizer": "path_hierarchy"
        }
      }
    }
  },
  "mappings": {
    "product": {
      "properties": {
        "label": {
          "type": "string",
          "analyzer": "english"
        },
        "categoryPath": {
          "type": "string",
          "index": "not_analyzed",
          "doc_values": true,
          "fields": {
            "tokenized": {
              "type": "string",
              "analyzer": "path_analyzer"
            }
          }
        }
      }
    }
  }
}

Based on the topic: Aggregation on a materialized path


(system) #5