Get number of all possible facets (but without producing them)


(pere roca ristol) #1

is there a way to get the "top 10" facets but at the same time get the *
count* of all the possible facets ( count(distinct(kingdom)) in SQL way) we
could have created (in that case, all distinct kingdoms in our document).

I know I could increase the "size" to a very big number, and count how many
facets are, only presenting the first 10, but then it generates a very big
JSON and processing costs may be much bigger.

A simple facet...

"facets" : {
"kingdom": {
"terms": {
"field": "kingdom",
"size": 10
}
}

i mean, i don't want to get something like {term:kingdom1, count:100} for
each possible facet (kingdom).I just want the first ten 'kingdoms' (the
more 'popular'), with its name and counts, but get also some info about how
many possible facets I could have generated.

hope I make me understand, thanks in advance

Pere

--


(Raffaele Sena) #2

wouldn't the "other" field accomplish what you need ? (or other + 10).
If I do a search with facets I get my top 10 terms and I guess other
means there are 1093 more terms that are not showing:

u'facets': {u'wname': {u'_type': u'terms',
u'missing': 0,
u'other': 1093,
u'terms': [{u'count': 23,
u'term': u'test_blueheron'},
{u'count': 10, u'term': u'PDF'},
{u'count': 9, u'term': u'MS'},
{u'count': 6, u'term': u'test'},
{u'count': 6, u'term': u'image'},
{u'count': 6, u'term': u'Word'},
{u'count': 5, u'term': u'OpenOffice'},
{u'count': 4, u'term': u'with'},
{u'count': 4, u'term': u'file'},
{u'count': 4, u'term': u'StarOffice'}],

On Tue, Sep 18, 2012 at 8:04 AM, pere roca ristol peroc79@gmail.com wrote:

is there a way to get the "top 10" facets but at the same time get the count
of all the possible facets ( count(distinct(kingdom)) in SQL way) we could
have created (in that case, all distinct kingdoms in our document).

I know I could increase the "size" to a very big number, and count how many
facets are, only presenting the first 10, but then it generates a very big
JSON and processing costs may be much bigger.

A simple facet...

"facets" : {
"kingdom": {
"terms": {
"field": "kingdom",
"size": 10
}
}

i mean, i don't want to get something like {term:kingdom1, count:100} for
each possible facet (kingdom).I just want the first ten 'kingdoms' (the more
'popular'), with its name and counts, but get also some info about how many
possible facets I could have generated.

hope I make me understand, thanks in advance

Pere

--

--


(pere roca ristol) #3

nope, as stated herehttp://www.elasticsearch.org/guide/reference/api/search/facets/
,* other* is the number of facet values not included in the returned
facets.
if you use 'size':10 in your facet query, it creates 10 'groups', and the
values (raw data records) not included in these groups are simply summed to
'other' parameter. So, in your request, you have 1093 records that don't
fit to your top 10 facets/groups.
This is not what I need.

On Tuesday, 18 September 2012 21:23:50 UTC+2, Raffaele Sena wrote:

wouldn't the "other" field accomplish what you need ? (or other + 10).
If I do a search with facets I get my top 10 terms and I guess other
means there are 1093 more terms that are not showing:

u'facets': {u'wname': {u'_type': u'terms',
u'missing': 0,
u'other': 1093,
u'terms': [{u'count': 23,
u'term': u'test_blueheron'},
{u'count': 10, u'term': u'PDF'},
{u'count': 9, u'term': u'MS'},
{u'count': 6, u'term': u'test'},
{u'count': 6, u'term': u'image'},
{u'count': 6, u'term': u'Word'},
{u'count': 5, u'term': u'OpenOffice'},
{u'count': 4, u'term': u'with'},
{u'count': 4, u'term': u'file'},
{u'count': 4, u'term': u'StarOffice'}],

On Tue, Sep 18, 2012 at 8:04 AM, pere roca ristol <per...@gmail.com<javascript:>>
wrote:

is there a way to get the "top 10" facets but at the same time get the
count
of all the possible facets ( count(distinct(kingdom)) in SQL way) we
could
have created (in that case, all distinct kingdoms in our document).

I know I could increase the "size" to a very big number, and count how
many
facets are, only presenting the first 10, but then it generates a very
big
JSON and processing costs may be much bigger.

A simple facet...

"facets" : {
"kingdom": {
"terms": {
"field": "kingdom",
"size": 10
}
}

i mean, i don't want to get something like {term:kingdom1, count:100}
for
each possible facet (kingdom).I just want the first ten 'kingdoms' (the
more
'popular'), with its name and counts, but get also some info about how
many
possible facets I could have generated.

hope I make me understand, thanks in advance

Pere

--

--


(Lukáš Vlček) #4

Hi,

you are looking for number of distinct terms in given field, right?
I am not sure if there is an API for this right now, but you might want to
look at https://github.com/jprante/elasticsearch-index-termlist (it does
not give you the number but list of terms). However, you could try to talk
to Jorg if it would be possible to extend this plugin.

Regards,
Lukas

On Wed, Sep 19, 2012 at 10:28 AM, pere roca ristol peroc79@gmail.comwrote:

nope, as stated herehttp://www.elasticsearch.org/guide/reference/api/search/facets/
,* other* is the number of facet values not included in the returned
facets.
if you use 'size':10 in your facet query, it creates 10 'groups', and
the values (raw data records) not included in these groups are simply
summed to 'other' parameter. So, in your request, you have 1093 records
that don't fit to your top 10 facets/groups.
This is not what I need.

On Tuesday, 18 September 2012 21:23:50 UTC+2, Raffaele Sena wrote:

wouldn't the "other" field accomplish what you need ? (or other + 10).
If I do a search with facets I get my top 10 terms and I guess other
means there are 1093 more terms that are not showing:

u'facets': {u'wname': {u'_type': u'terms',
u'missing': 0,
u'other': 1093,
u'terms': [{u'count': 23,
u'term': u'test_blueheron'},
{u'count': 10, u'term': u'PDF'},
{u'count': 9, u'term': u'MS'},
{u'count': 6, u'term': u'test'},
{u'count': 6, u'term': u'image'},
{u'count': 6, u'term': u'Word'},
{u'count': 5, u'term': u'OpenOffice'},
{u'count': 4, u'term': u'with'},
{u'count': 4, u'term': u'file'},
{u'count': 4, u'term':
u'StarOffice'}],

On Tue, Sep 18, 2012 at 8:04 AM, pere roca ristol per...@gmail.com
wrote:

is there a way to get the "top 10" facets but at the same time get the
count
of all the possible facets ( count(distinct(kingdom)) in SQL way) we
could
have created (in that case, all distinct kingdoms in our document).

I know I could increase the "size" to a very big number, and count how
many
facets are, only presenting the first 10, but then it generates a very
big
JSON and processing costs may be much bigger.

A simple facet...

"facets" : {
"kingdom": {
"terms": {
"field": "kingdom",
"size": 10
}
}

i mean, i don't want to get something like {term:kingdom1, count:100}
for
each possible facet (kingdom).I just want the first ten 'kingdoms' (the
more
'popular'), with its name and counts, but get also some info about how
many
possible facets I could have generated.

hope I make me understand, thanks in advance

Pere

--

--

--


(system) #5