I found a superbly wonderful way to do this on the client (Java client)
side! First, I started with the excellent suggestion made by Lukáš Vlček at:
http://elasticsearch-users.115913.n3.nabble.com/facet-and-grouping-td4020055.html
I therefore implemented a follow-on client-side grouping hierarchy in Java
(not js as that suggestion used). Works beautifully!
Some hints:
The groupings are separated by the ~~~ string because that's what was
suggested and it works. So a quoted "~~~" Java Pattern was used to split
the resulting terms.
But how to handle the sorting? I keep the hierarchy in a map, but use a
Java LinkedHashMap instead of a simple HashMap. That way, every entry is
added in the same sorted order as returned by Elasticsearch. So the
top-level counts are in order, the child counts under them are in order
relative to the other children, and so on.
As an example, I loaded the US Census cities into an ES type with "state"
and "city" fields containing the state abbreviations and city names,
respectively. Here is the query:
{
"from" : 0,
"size" : 50000,
"query" : {
"match_all" : { }
},
"version" : true,
"explain" : false,
"facets" : {
"state_city_combinations" : {
"terms" : {
"size" : 100,
"script" : "doc['state'].value + "~~~" + doc['city'].value"
}
}
}
}
When generating the response:
-
The resulting terms are lowercased and stemmed using the English
snowball analyzer. So Elasticsearch creates citi for City, and so on. I
suppose I could have an additional non-stemmed copy of that field, but for
now this works fine for me. However, I would welcome any suggestions on the
best approach to this.
-
I read that it's Not Good to use a Terms facet on a field that is
analyzed into multiple words. Like my "city" field. But it seems to work
well enough. Again, any suggestions would be welcomed.
-
I did not find an easy way to emit the TermsFacet response as JSON, so I
wrote my own version to match what the HTTP REST interface generates. But
then, as a child of the "terms" object, I added the "combinations" : { }object. I wanted the "combinations" but left in the "terms"
: { } object for testing; no big deal.
-
My Java code made only one pass through the terms. A tiny bit of
recursion to add each child to its parent's LinkedHashMap (recursion depth
was limited to total number of fields in the hierarchy: Typically very
small.
And here is the response, which looks just like the HTTP REST response but
with the added "combinations" : { } object:
{
"facets" : {
"state_city_combinations" : {
"_type" : "terms",
"total" : 25376,
"other" : 24346,
"missing" : 0,
"combinations" : {
"fl" : {
"beach" : 61,
"lake" : 25,
"citi" : 21,
"estat" : 9,
"fort" : 6,
"bay" : 6,
"east" : 5,
"creek" : 4
},
"tx" : {
"citi" : 36,
"creek" : 10,
"oak" : 9,
"la" : 9,
"hill" : 8,
"bay" : 6,
"grove" : 5,
"spring" : 4,
"park" : 4,
"falcon" : 4,
"cross" : 4,
"acr" : 4
},
"mo" : {
"citi" : 36,
"lake" : 7,
"creek" : 4
},
"il" : {
"citi" : 29,
"hill" : 9,
"grove" : 9,
"lake" : 8
},
"mn" : {
"lake" : 27,
"fall" : 6,
"citi" : 4
},
"wi" : {
"lake" : 25,
"citi" : 8,
"fall" : 7
},
"ny" : {
"east" : 25,
"fall" : 16,
"lake" : 11,
"bay" : 7,
"north" : 6,
"new" : 5,
"harbor" : 5,
"beach" : 5,
"hill" : 4
},
"ia" : {
"citi" : 22,
"center" : 4
},
"ca" : {
"citi" : 21,
"beach" : 20,
"hill" : 17,
"lake" : 8,
"east" : 8
},
"pa" : {
"citi" : 19,
"east" : 17,
"hill" : 16,
"height" : 11,
"mount" : 7,
"new" : 4,
"beaver" : 4
},
"ok" : {
"citi" : 18,
"creek" : 8,
"grove" : 4,
"acr" : 4
},
"wa" : {
"lake" : 16,
"citi" : 7,
"east" : 4,
"creek" : 4
},
"oh" : {
"citi" : 16,
"hill" : 14,
"height" : 13,
"new" : 7,
"center" : 6,
"north" : 5,
"lake" : 4,
"fall" : 4
},
"mi" : {
"lake" : 16,
"citi" : 12
},
"or" : {
"citi" : 14
},
"ks" : {
"citi" : 13
},
"ak" : {
"bay" : 13
},
"ne" : {
"citi" : 11
},
"tn" : {
"citi" : 10,
"hill" : 7
},
"nj" : {
"citi" : 10,
"beach" : 9,
"lake" : 5,
"height" : 4
},
"in" : {
"citi" : 10,
"new" : 4
},
"ct" : {
"center" : 9
},
"ky" : {
"hill" : 8
},
"ga" : {
"citi" : 8
},
"ut" : {
"citi" : 6,
"lake" : 4
},
"sd" : {
"citi" : 6,
"lake" : 4
},
"nc" : {
"citi" : 6,
"beach" : 5
},
"al" : {
"citi" : 5
},
"sc" : {
"beach" : 4
},
"pr" : {
"la" : 4
},
"md" : {
"chase" : 4
},
"de" : {
"beach" : 4
}
},
"terms" : [ {
"term" : "fl~beach",
"count" : 61
}, {
"term" : "tx~citi",
"count" : 36
}, {
"term" : "mo~citi",
"count" : 36
}, {
"term" : "il~citi",
"count" : 29
}, {
"term" : "mn~lake",
"count" : 27
}, {
"term" : "wi~lake",
"count" : 25
}, {
"term" : "ny~east",
"count" : 25
}, {
"term" : "fl~lake",
"count" : 25
}, {
"term" : "ia~citi",
"count" : 22
}, {
"term" : "fl~citi",
"count" : 21
}, {
"term" : "ca~citi",
"count" : 21
}, {
"term" : "ca~beach",
"count" : 20
}, {
"term" : "pa~citi",
"count" : 19
}, {
"term" : "ok~citi",
"count" : 18
}, {
"term" : "pa~east",
"count" : 17
}, {
"term" : "ca~hill",
"count" : 17
}, {
"term" : "wa~lake",
"count" : 16
}, {
"term" : "pa~hill",
"count" : 16
}, {
"term" : "oh~citi",
"count" : 16
}, {
"term" : "ny~fall",
"count" : 16
}, {
"term" : "mi~lake",
"count" : 16
}, {
"term" : "or~citi",
"count" : 14
}, {
"term" : "oh~hill",
"count" : 14
}, {
"term" : "oh~height",
"count" : 13
}, {
"term" : "ks~citi",
"count" : 13
}, {
"term" : "ak~bay",
"count" : 13
}, {
"term" : "mi~citi",
"count" : 12
}, {
"term" : "pa~height",
"count" : 11
}, {
"term" : "ny~lake",
"count" : 11
}, {
"term" : "ne~citi",
"count" : 11
}, {
"term" : "tx~creek",
"count" : 10
}, {
"term" : "tn~citi",
"count" : 10
}, {
"term" : "nj~citi",
"count" : 10
}, {
"term" : "in~citi",
"count" : 10
}, {
"term" : "tx~oak",
"count" : 9
}, {
"term" : "tx~la",
"count" : 9
}, {
"term" : "nj~beach",
"count" : 9
}, {
"term" : "il~hill",
"count" : 9
}, {
"term" : "il~grove",
"count" : 9
}, {
"term" : "fl~estat",
"count" : 9
}, {
"term" : "ct~center",
"count" : 9
}, {
"term" : "wi~citi",
"count" : 8
}, {
"term" : "tx~hill",
"count" : 8
}, {
"term" : "ok~creek",
"count" : 8
}, {
"term" : "ky~hill",
"count" : 8
}, {
"term" : "il~lake",
"count" : 8
}, {
"term" : "ga~citi",
"count" : 8
}, {
"term" : "ca~lake",
"count" : 8
}, {
"term" : "ca~east",
"count" : 8
}, {
"term" : "wi~fall",
"count" : 7
}, {
"term" : "wa~citi",
"count" : 7
}, {
"term" : "tn~hill",
"count" : 7
}, {
"term" : "pa~mount",
"count" : 7
}, {
"term" : "oh~new",
"count" : 7
}, {
"term" : "ny~bay",
"count" : 7
}, {
"term" : "mo~lake",
"count" : 7
}, {
"term" : "ut~citi",
"count" : 6
}, {
"term" : "tx~bay",
"count" : 6
}, {
"term" : "sd~citi",
"count" : 6
}, {
"term" : "oh~center",
"count" : 6
}, {
"term" : "ny~north",
"count" : 6
}, {
"term" : "nc~citi",
"count" : 6
}, {
"term" : "mn~fall",
"count" : 6
}, {
"term" : "fl~fort",
"count" : 6
}, {
"term" : "fl~bay",
"count" : 6
}, {
"term" : "tx~grove",
"count" : 5
}, {
"term" : "oh~north",
"count" : 5
}, {
"term" : "ny~new",
"count" : 5
}, {
"term" : "ny~harbor",
"count" : 5
}, {
"term" : "ny~beach",
"count" : 5
}, {
"term" : "nj~lake",
"count" : 5
}, {
"term" : "nc~beach",
"count" : 5
}, {
"term" : "fl~east",
"count" : 5
}, {
"term" : "al~citi",
"count" : 5
}, {
"term" : "wa~east",
"count" : 4
}, {
"term" : "wa~creek",
"count" : 4
}, {
"term" : "ut~lake",
"count" : 4
}, {
"term" : "tx~spring",
"count" : 4
}, {
"term" : "tx~park",
"count" : 4
}, {
"term" : "tx~falcon",
"count" : 4
}, {
"term" : "tx~cross",
"count" : 4
}, {
"term" : "tx~acr",
"count" : 4
}, {
"term" : "sd~lake",
"count" : 4
}, {
"term" : "sc~beach",
"count" : 4
}, {
"term" : "pr~la",
"count" : 4
}, {
"term" : "pa~new",
"count" : 4
}, {
"term" : "pa~beaver",
"count" : 4
}, {
"term" : "ok~grove",
"count" : 4
}, {
"term" : "ok~acr",
"count" : 4
}, {
"term" : "oh~lake",
"count" : 4
}, {
"term" : "oh~fall",
"count" : 4
}, {
"term" : "ny~hill",
"count" : 4
}, {
"term" : "nj~height",
"count" : 4
}, {
"term" : "mo~creek",
"count" : 4
}, {
"term" : "mn~citi",
"count" : 4
}, {
"term" : "md~chase",
"count" : 4
}, {
"term" : "in~new",
"count" : 4
}, {
"term" : "ia~center",
"count" : 4
}, {
"term" : "fl~creek",
"count" : 4
}, {
"term" : "de~beach",
"count" : 4
} ]
}
}
}
On Wednesday, February 27, 2013 6:00:21 AM UTC-5, Clinton Gormley wrote:
But in this case, I have to do:
1 + (top genres) queries.
Is there a way that I can get all the info in one fetch?
No. Currently we don't have hierarchical facets. It is on the todo list
clint
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.