Elasticsearch : order aggregation by score performance

Mickael_Magniez · February 7, 2017, 1:54pm

Hi,

On elastic 5.2, I'm trying to sort my aggregation buckets by maximum score of matching documents, but performance are terrible (x10) compare to the _count sort.

{
  "query": {
    "match": {"text": "shirt"}
  },
  "aggs": {
    "offers": {
      "terms": {
        "field": "category",
        "order": {
          "max_score.value": "desc"
        }
      },
      "aggs": {
        "max_score": {
          "max": {
            "script": {
              "lang": "painless",
              "inline": "_score"
            }
          }
        }
      }
    }
  }
}

Is there any to achieve this with decent performance?

I've tried on elastic 1.7, and performance are better for this sort.

Best regards

Mark_Harwood · February 7, 2017, 1:59pm

The sampler aggregation may be of interest.
What is the business question you're trying to answer? We can figure out what the best approach is once we understand the question better.

Mickael_Magniez · February 7, 2017, 2:07pm

Thanks for your answer.

Sampler is not exactly what i need, i don't want to retreive the categories of the most relevants documents, but all categories, order by max (or avg) relevance of the documents in this category.

My index is a typical ecommerce product index, with products belonging to categories. So when i search for "laptop", i want to display the "most relevant" categories.

Mark_Harwood · February 7, 2017, 2:22pm

Average relevance may not be the most helpful if your query is in any way sloppy (ORs. fuzzy..).

If you search for "blue shirt" the average relevance of the true best category (let's say menswear) would be diluted by the long tail of clothes that only matched part of the query e.g. the many "blue jeans".
A category with a single match on it e.g. electrical might match on "shirt press" which might be the rarer of the 2 search terms and therefore score more highly than the average menswear result. Using samples keeps your analysis focused on the relevant results and not the garbage.

Try put a "significant_terms" agg on the category field under the sampler. This will help account for the uneven category sizes and detect an uplift in frequency in the results.
Similarly described docs will match similarly so maybe you want to also try the diversified_sampler to diversify on something like manufacturer and ensure a healthy sample.

Mickael_Magniez · February 7, 2017, 2:44pm

Sampler solution seems to work for some use case.

But doc count are inaccurate, and results for my uses cases seem better with an order on score. I'd really like to understand why this query performed well on ES 1.7, but not anymore on ES 5.2.

I have one question about the score result of the sampler aggregation.

I have 2 categories : smartphone and accessories, for the keyword "iphone", query return a few very well ranked smartphone, and many less scored accessories.

With a shard_size of 100 (i have 5 shards), result is :

"buckets": [
    {
    "key": "smartphone",
    "doc_count": 85,
    "score": 63.43663808983667,
    "bg_count": 4408
    },
    {
    "key": "accesory",
    "doc_count": 529,
    "score": 37.22400913694616,
    "bg_count": 110883
    }
]

Bu with shard_size of 1000 :

"buckets": [
    {
    "key": "accesory",
    "doc_count": 4634,
    "score": 76.95471502433895,
    "bg_count": 119001
    },
    {
    "key": "smartphone",
    "doc_count": 99,
    "score": 0.8366545210690426,
    "bg_count": 4939
    }
]

Why smartphone score is so low when i increase shard_size?

Mark_Harwood · February 7, 2017, 3:18pm

You're extending the sample to look at lower-quality matches. This video shows exactly the effects of sample sizes and significance scores: https://www.youtube.com/watch?v=azP15yvbOBA&t=3s You can clearly see when the signal is strong and weak.

However - just because you have a high-score from a string-matching perspective (an iphone cable may mention "iphone" many times in the product description) it doesn't mean that is what people searching for "iphone" generally want to find.
To figure out what people who search for "iphone" really want you need to turn your attention away from the descriptions in your product catalog and pay closer attention to your click logs. We did a demo of this on some real data (search term + clicked product code + category) here [1] which may be of interest.

[1] https://www.elastic.co/elasticon/conf/2016/sf/graph-capabilities-in-the-elastic-stack (see 28 mins into this)

Mickael_Magniez · February 7, 2017, 3:27pm

i already have a complex function score which take into consideration

less weight for accessories
more weight for popular products

So in my result list, the first results are from smartphone category not accesory, that's why i want to order by aggregation by score, because i know my score is correct.

Mark_Harwood · February 7, 2017, 3:31pm

Is that adaptive? If I search for "iphone cable" does that change the weightings used?

Mickael_Magniez · February 7, 2017, 3:37pm

No but the presence of the "cable", which is present within fields with high weight is enough to the accessory category come first

Mark_Harwood · February 7, 2017, 5:12pm

OK so let's put the quality of the scoring logic aside and consider what you do with the results of these scores.

A system based on "max" score is a first-past-the-post system and could make bad recommendations on the evidence of a single high-scoring outlier. There's little weight-of-evidence to back the suggestion.
A system based on "avg" score of all results in a category will dilute any high scores with the number of low-scores.

My proposal of a system based on looking at a sample of high-scoring results is the middle ground between these 2 extremes.
If you want accurate doc counts[1] for all categories by all means use a regular terms agg on all results to get the raw data but in the same request use a sampler and embedded significant_terms or straight terms on category field to get a score you can use to rank the importance of these categories.

[1] Accurate doc counts can sometimes be hard to explain. I know of an ecommerce search consultancy who always advocate making user queries ANDed (i.e iphone AND cable) so that facet counts make more sense (they would exclude things like HDMI/power/ethernet cables) and they automatically resort to falling back to OR queries if the number of results is low. Users don't necessarily know about the difference between these decisions.

Mickael_Magniez · February 8, 2017, 8:47am

Thank you for your detailled answer.

But i'd really like to understand with adding score information on aggregation is so slow

This is profiling information

Aggregation with score

{
    "category": {
        "terms": {
            "field": "category",
            "size": 100
        },
        "aggs": {
            "score": {
                "max": {
                    "script": {
                        "lang": "painless",
                        "inline": "_score"
                    }
                }
            }
        }
    }
}

Profile result :

{
    "type": "org.elasticsearch.search.aggregations.bucket.terms.GlobalOrdinalsStringTermsAggregator$WithHash",
    "description": "category",
    "time": "97.76644000ms",
    "breakdown": {
        "reduce": 0,
        "build_aggregation": 90977021,
        "build_aggregation_count": 1,
        "initialize": 1760,
        "initialize_count": 1,
        "reduce_count": 0,
        "collect": 6773280,
        "collect_count": 14377
    },
    "children": [{
        "type": "org.elasticsearch.search.aggregations.metrics.max.MaxAggregator",
        "description": "score",
        "time": "4.316251000ms",
        "breakdown": {
            "reduce": 0,
            "build_aggregation": 5004,
            "build_aggregation_count": 125,
            "initialize": 315,
            "initialize_count": 1,
            "reduce_count": 0,
            "collect": 4296429,
            "collect_count": 14377
        }
    }]
}

Aggregation with field

{
    "category": {
        "terms": {
            "field": "category",
            "size": 100
        },
        "aggs": {
            "score": {
                "max": {
                    "field": "product_id"
                }
            }
        }
    }
}

Profile result :

{
    "type": "org.elasticsearch.search.aggregations.bucket.terms.GlobalOrdinalsStringTermsAggregator$WithHash",
    "description": "category",
    "time": "6.224784000ms",
    "breakdown": {
        "reduce": 0,
        "build_aggregation": 1436553,
        "build_aggregation_count": 1,
        "initialize": 1825,
        "initialize_count": 1,
        "reduce_count": 0,
        "collect": 4771863,
        "collect_count": 14541
    },
    "children": [{
        "type": "org.elasticsearch.search.aggregations.metrics.max.MaxAggregator",
        "description": "score",
        "time": "0.4531270000ms",
        "breakdown": {
            "reduce": 0,
            "build_aggregation": 7501,
            "build_aggregation_count": 123,
            "initialize": 298,
            "initialize_count": 1,
            "reduce_count": 0,
            "collect": 430663,
            "collect_count": 14541
        }
    }]
}

system · March 8, 2017, 8:48am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Top hit aggregation with _score sorting Elasticsearch	3	1693	July 5, 2017
Aggregating, picking top value using ordering then ordering buckets by score Elasticsearch	2	363	October 8, 2019
Sort aggregation by score Elasticsearch	1	354	March 23, 2018
Order the bucket by max score of document in Aggregations V1.0.RC2 Elasticsearch	2	941	July 6, 2017
Sort aggregation based on TopHits (ie top 10) average score Elasticsearch	2	705	May 25, 2021

Elasticsearch : order aggregation by score performance

Related topics