I'm trying to set up Elasticsearch for categorized product searching, but I'm having trouble wrapping my mind around the aggregations, although I believe my situation is fairly common.
I have a bunch of items which can have one or more categories, with one of them being the primary. The structure is like the following:
{
"title": "Bike",
"description": "Some vehicle",
"category": {
"uuid": "a3c88bf9-7597-41d3-925d-0b3cc6f2cdfd",
"title": "Road vehicles"
},
"categories": [
{
"uuid": "a3c88bf9-7597-41d3-925d-0b3cc6f2cdfd",
"title": "Road vehicles"
},
{
"uuid": "232fc934-18f6-48df-952c-ae6f17550d1e",
"title": "Vehicles"
}
]
}
(with "category" being the primary category, and "categories" containing all)
When searching, I would like to group the items by their primary category and also return a count of how many products are in this category - also the ones having it as a secondary category.
However, each item should only be listed once under its primary category.
The search result should be split into categories (sorted by max score descending), and for each category the top hits should be ordered by _score.
Something like:
ROAD VEHICLES (count = 2)
* Bike
MOTORIZED VEHICLES (count = 2)
* Car
FLYING VEHICLES (count = 1)
* Aeroplane
Maybe it would make things easier to move the category UUID one level up (or use include_in_root/include_in_parent), but I think my lack of understanding goes further than that.
I used the following for creating the index and mappings (httpie + bash):
http put localhost:9200/my_test_index
http put localhost:9200/my_test_index/_mapping/_doc <<< '{
"properties": {
"title": {
"type": "text"
},
"description": {
"type": "text"
},
"category": {
"type": "nested",
"properties": {
"uuid": {
"type": "keyword"
},
"title": {
"type": "text"
}
}
},
"categories": {
"type": "nested",
"properties": {
"uuid": {
"type": "keyword"
},
"title": {
"type": "text"
}
}
}
}
}'
Three dummy products:
http put localhost:9200/my_test_index/_doc/1 <<< '{
"title": "Bike",
"description": "Some vehicle",
"category": {
"uuid": "a3c88bf9-7597-41d3-925d-0b3cc6f2cdfd",
"title": "Road vehicles"
},
"categories": [
{
"uuid": "a3c88bf9-7597-41d3-925d-0b3cc6f2cdfd",
"title": "Road vehicles"
},
{
"uuid": "232fc934-18f6-48df-952c-ae6f17550d1e",
"title": "Vehicles"
}
]
}'
http put localhost:9200/my_test_index/_doc/2 <<< '{
"title": "Car",
"description": "Another vehicle",
"category": {
"uuid": "9916aaf9-5955-4a23-9dba-0805a43104bf",
"title": "Motorized vehicles"
},
"categories": [
{
"uuid": "9916aaf9-5955-4a23-9dba-0805a43104bf",
"title": "Motorized vehicles"
},
{
"uuid": "a3c88bf9-7597-41d3-925d-0b3cc6f2cdfd",
"title": "Road vehicles"
},
{
"uuid": "232fc934-18f6-48df-952c-ae6f17550d1e",
"title": "Vehicles"
}
]
}'
http put localhost:9200/my_test_index/_doc/3 <<< '{
"title": "Aeroplane",
"description": "Yet another vehicle",
"category": {
"uuid": "60c7a30c-da41-4592-b304-0e87a9f8d6a7",
"title": "Flying vehicles"
},
"categories": [
{
"uuid": "60c7a30c-da41-4592-b304-0e87a9f8d6a7",
"title": "Flying vehicles"
},
{
"uuid": "9916aaf9-5955-4a23-9dba-0805a43104bf",
"title": "Motorized vehicles"
},
{
"uuid": "232fc934-18f6-48df-952c-ae6f17550d1e",
"title": "Vehicles"
}
]
}'
And this is as far as my query got:
http get localhost:9200/my_test_index/_search <<< '{
"query": {
"multi_match": {
"query": "vehicle",
"fields": [
"title^3",
"description"
],
"type": "cross_fields",
"operator": "and"
}
},
"aggs": {
"category1": {
"nested": {
"path": "category"
},
"aggs": {
"category2": {
"terms": {
"field": "category.uuid",
"order": {
"max_score": "desc"
},
"size": 6
},
"aggs": {
"max_score": {
"max": {
"script": "_score"
}
},
"category3": {
"top_hits": {
"size": 3,
"sort": {
"_score": "desc"
}
}
},
"aggs": {
"reverse_nested": {},
"aggs": {
"category4": {
"nested": {
"path": "categories"
},
"aggs": {
"category5": {
"terms": {
"field": "categories.uuid"
}
}
}
}
}
}
}
}
}
}
}
}'
Any pointers on what I should look into?