Index creation:
PUT file_path
{
"mappings": {
"properties": {
"keyword": {
"type": "text"
},
"search_volume": {
"type": "integer"
},
"cpc": {
"type": "integer"
},
"score": {
"type": "float"
},
"category": {
"type": "keyword"
},
"category_path": {
"type": "keyword"
},
"position":{
"type": "float"
},
"url": {
"type": "keyword"
}
}
}
}
Some sample documents:
{
"_index" : "file_path",
"_type" : "_doc",
"_id" : "Tgk1-nABTUHKytkMmmyr",
"_score" : 10.267364,
"_source" : {
"keyword" : "mp4ba movies",
"search_volume" : 20,
"cpc" : 0,
"score" : 24.26,
"category" : "Movies",
"category_path" : "/Business/Arts_and_Entertainment/Models/Individual/B/Bellucci,_Monica/Movies",
"position" : 20.0,
"url" : "http://mysitetester.com/movies"
}
},
{
"_index" : "file_path",
"_type" : "_doc",
"_id" : "5Ak1-nABTUHKytkMGw6m",
"_score" : 10.267364,
"_source" : {
"keyword" : "marital infidelity movies",
"search_volume" : 50,
"cpc" : 0,
"score" : 39.2635,
"category" : "Movies",
"category_path" : "/Arts/Movies/Studios/Warner_Bros./Movies",
"position" : 17.0,
"url" : "http://mysitetester.com/movies"
}
},
{
"_index" : "file_path",
"_type" : "_doc",
"_id" : "8wk1-nABTUHKytkMGw6n",
"_score" : 10.267364,
"_source" : {
"keyword" : "devotional movies",
"search_volume" : 480,
"cpc" : 0,
"score" : 56.548,
"category" : "Movies",
"category_path" : "/Arts/Movies/Studios/Warner_Bros./Movies",
"position" : 9.0,
"url" : "http://mysitetester.com/movies"
}
}
The query I am using to aggregate the top level of the category_path:
GET file_path/_search?size=0
{
"aggs": {
"tree": {
"path_hierarchy": {
"field": "category_path",
"separator": "/",
"max_depth": 0,
"size": 30
},
"aggs": {
"search_volumes": {
"avg": {
"field": "search_volume"
}
}
}
}
}
}
And its response:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10000,
"relation" : "gte"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"tree" : {
"buckets" : [
{
"key" : "World",
"doc_count" : 308153,
"path" : [ ],
"search_volumes" : {
"value" : 53.68258624774057
}
},
{
"key" : "Arts",
"doc_count" : 206891,
"path" : [ ],
"search_volumes" : {
"value" : 63.65835149909856
}
},
{
"key" : "Regional",
"doc_count" : 107047,
"path" : [ ],
"search_volumes" : {
"value" : 58.696553850177956
}
},
{
"key" : "Business",
"doc_count" : 90660,
"path" : [ ],
"search_volumes" : {
"value" : 51.84634899624972
}
},
{
"key" : "Computers",
"doc_count" : 82783,
"path" : [ ],
"search_volumes" : {
"value" : 56.790524624621
}
},
{
"key" : "Society",
"doc_count" : 64092,
"path" : [ ],
"search_volumes" : {
"value" : 58.82512638082756
}
},
{
"key" : "Games",
"doc_count" : 57919,
"path" : [ ],
"search_volumes" : {
"value" : 62.482432362437194
}
},
{
"key" : "Science",
"doc_count" : 46828,
"path" : [ ],
"search_volumes" : {
"value" : 55.13837874775775
}
},
{
"key" : "Reference",
"doc_count" : 33955,
"path" : [ ],
"search_volumes" : {
"value" : 55.791783242526876
}
},
{
"key" : "Sports",
"doc_count" : 22052,
"path" : [ ],
"search_volumes" : {
"value" : 60.563214220932345
}
},
{
"key" : "Recreation",
"doc_count" : 18459,
"path" : [ ],
"search_volumes" : {
"value" : 60.06121675063655
}
},
{
"key" : "Health",
"doc_count" : 18455,
"path" : [ ],
"search_volumes" : {
"value" : 67.62015713898673
}
},
{
"key" : "Shopping",
"doc_count" : 11977,
"path" : [ ],
"search_volumes" : {
"value" : 61.9195123987643
}
},
{
"key" : "Home",
"doc_count" : 9212,
"path" : [ ],
"search_volumes" : {
"value" : 64.33239253148068
}
},
{
"key" : "News",
"doc_count" : 689,
"path" : [ ],
"search_volumes" : {
"value" : 60.711175616835995
}
}
]
}
}
}
I am using this plugin:
https://github.com/opendatasoft/elasticsearch-aggregation-pathhierarchy
The query above is what I am using to see the top level by setting depth: 0
. The goal is to be able to query through each level of a given top level. For example: /Business/Arts_and_Entertainment/Models/Individual/B/Bellucci,_Monica/Movies
/Business/Arts_and_Entertainment/Actors/Individual/H/Hanks,_Tom/Producer
/Business/Financial/Companies/Banks/Wells_Fargo
Going down from Business
to Arts_and_Entertainment
where the buckets returned would be
Arts_and_Entertainment
and Financial
with there respective document counts and whatever metrics that are added to the query (i.e. sum of score, cpc .... etc)
I have tried to sub aggregate a bucket using the bucket key returned, but have not got that query to work. What I have found is that I am not able to use _key
with bucket_selector
and bucket_path
.
Let me know if there is anything else that would help explain my situation.
Thanks!