I'm building a search suggestion feature and need help choosing between terms aggregation and field collapse for my use case.
My Dataset:
-
3 million items in the index
-
100,000+ unique product names (`brandName` in the mappings)
-
Users search by typing partial names (autocomplete)
-
I only need to return unique name strings (not full documents)
{
"mappings": {
"properties": {
"completionField": {
"type": "search_as_you_type",
"max_shingle_size": 3
},
"product": {
"type": "object",
"properties": {
"brandName": {
"type": "text",
"analyzer": "product_name_analyzer",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
Response Needed:
[
{ "text": "Panadol" },
{ "text": "Advil" },
{ "text": "Aspirin" }
]
Approach 1: Terms Aggregation
{
"size": 0,
"_source": false,
"query": {
"bool": {
"should": [
{
"multi_match": {
"query": "pana",
"type": "bool_prefix",
"fields": ["completionField", "completionField._2gram", "completionField._3gram"]
}
},
{
"multi_match": {
"query": "pana",
"fields": ["product.brandName^4"]
}
}
]
}
},
"aggs": {
"unique_brand_names": {
"terms": {
"field": "product.brandName.keyword",
"size": 5,
"order": { "max_score": "desc" }
},
"aggs": {
"max_score": {
"max": { "script": "_score" }
}
}
}
}
}
Approach 2: Field Collapse
{
"size": 5,
"_source_includes": ["product.brandName"],
"query": {
"bool": {
"should": [
{
"multi_match": {
"query": "pana",
"type": "bool_prefix",
"fields": ["completionField", "completionField._2gram", "completionField._3gram"]
}
},
{
"multi_match": {
"query": "pana",
"fields": ["product.brandName^4"]
}
}
]
}
},
"collapse": {
"field": "product.brandName.keyword"
},
"sort": ["_score"]
}
Questions:
- For returning only unique string values (not documents), which approach is more efficient at this scale?
- Which uses less memory per query?
- Which provides more accurate relevance ordering?
- Are there alternative approaches I should consider?
Benchmark results so far:
Terms aggregation is ~2x faster than collapse on average