Terms aggregation - Sort on the relevancy of the terms


(srini P) #1

Hi,
We have this usecase, where we are doing terms aggregation on a multi-valued (array) field. The documents are initially filtered using a regex match on this field, and the terms aggregation is done on the same field.
The terms aggregation is returning the terms sorted by doc_count as expected.
We would like the sorting to be based on the relevancy of the term with respect to the initial regex filter. Is this possible ?

Refer to my example below for more clarity.

PUT someindex
{
    "settings": {
        "index": {
            "number_of_replicas": 0,
            "number_of_shards": 1,
            "search.slowlog.threshold.query.debug": "1ms"
        }
    },
    "mappings": {
        "my_type": {
            "dynamic_templates": [
                {
                    "strings": {
                        "mapping": {
                            "type": "keyword"
                        },
                        "match_mapping_type": "string"
                    }
                }
            ]
        }
    }
}
PUT someindex/my_type/1
{
  "title": "document 001",
  "tags":  ["lucene", "search" ]
}

PUT someindex/my_type/2
{
  "title": "document 002",
  "tags":  [ "lucene is a search library", "lucene", "elastic", "search" ]
}

GET someindex/_search
{
  "size": 0,
  "query": {
    "bool": {
      "filter": [
        {
          "regexp": {
            "tags": {
              "value": ".*lucene.*|.*search.*"
            }
          }
        }
      ]
    }
  },
  "aggregations": {
    "requestedDimension": {
      "terms": {
        "field": "tags"
      }
    }
  }
}

Response :

{
  "took": 9,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "requestedDimension": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "lucene",
          "doc_count": 2
        },
        {
          "key": "search",
          "doc_count": 2
        },
        {
          "key": "elastic",
          "doc_count": 1
        },
        {
          "key": "lucene is a search library",
          "doc_count": 1
        }
      ]
    }
  }
}

Looking for a query to get the below response, as "lucene is a search library" is more relevant to my filter.

{
  "took": 9,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "requestedDimension": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "lucene is a search library",
          "doc_count": 1
        },
        {
          "key": "lucene",
          "doc_count": 2
        },
        {
          "key": "search",
          "doc_count": 2
        },
        {
          "key": "elastic",
          "doc_count": 1
        }
      ]
    }
  }
}

(Luca Wintergerst) #2

Hello Srini,
unfortunately that is not possible as far as I know. I will double check with my colleagues and get back to you.

Scoring is only done as part of they query phase and can't be applied to aggregations.

Thank you for providing the exact query and aggregation. This makes it much easier for us to help you.


(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.