Term facet memory consideration in the documentation

Tux_Racer · January 10, 2013, 10:58pm

Hello List,

I am not sure I understand the 'Memory consideration' paragraph at:

http://www.elasticsearch.org/guide/reference/api/search/facets/terms-facet.html

in particular the phrase "Term facet causes the relevant field values to
be loaded into memory. " (maybe because of my bad command of English )

If I have 100 million documents, each document having a field 'field1'
of type 'byte', and if only 30 different values (or terms) are ever used
across all the documents, will a term facet search on the 'field1' load
into RAM

a) 100 million times one byte, (relevant=all the values)
b) or just 30 times one byte, (relevant=all the distinct values)
c) or just the N<=30 times one byte where N is the number of documents
matching the facet filter? (relevant=all the distinct values that one
can possibly return with the filters)

Also, is there a way to get approximative counts per term, but with less
memory use and/or that could be used when the count per term is really
large?
BTW is 1<<32=4294967296 the maximum count one can get, or does the count
use a float?

Thanks
TuXRaceR

--

dadoonet · January 11, 2013, 2:23am

IMHO, it's c)

BTW for yor last question, look at: https://github.com/ptdavteam/elasticsearch-approx-plugin

HTH

David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 10 janv. 2013 à 23:58, TuX RaceR tuxracer69@gmail.com a écrit :

Hello List,

I am not sure I understand the 'Memory consideration' paragraph at:

http://www.elasticsearch.org/guide/reference/api/search/facets/terms-facet.html

in particular the phrase "Term facet causes the relevant field values to be loaded into memory. " (maybe because of my bad command of English )

If I have 100 million documents, each document having a field 'field1' of type 'byte', and if only 30 different values (or terms) are ever used across all the documents, will a term facet search on the 'field1' load into RAM

a) 100 million times one byte, (relevant=all the values)
b) or just 30 times one byte, (relevant=all the distinct values)
c) or just the N<=30 times one byte where N is the number of documents matching the facet filter? (relevant=all the distinct values that one can possibly return with the filters)

Also, is there a way to get approximative counts per term, but with less memory use and/or that could be used when the count per term is really large?
BTW is 1<<32=4294967296 the maximum count one can get, or does the count use a float?

Thanks
TuXRaceR

--

Tux_Racer · January 11, 2013, 11:11am

Thank you David,

actually after reading

http://elasticsearch-users.115913.n3.nabble.com/terms-facet-explodes-memory-td3258748.html

I would exclude c)

Thank you for the very interesting link

I am not sure if you can combine this plugin with a facet 'filter'. I.e
not do only a count by date, but do a count by date for documents
matching a condition (e.g documents belonging to a user)

Thanks
TuXRaceR

On 01/11/2013 02:23 AM, David Pilato wrote:

IMHO, it's c)

BTW for yor last question, look at:
GitHub - pearson-enabling-technologies/elasticsearch-approx-plugin: Plugin for ElasticSearch to do approximate or exact distinct counts, and fast term lists

HTH

David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 10 janv. 2013 à 23:58, TuX RaceR <tuxracer69@gmail.com
mailto:tuxracer69@gmail.com> a écrit :

Hello List,

I am not sure I understand the 'Memory consideration' paragraph at:

Elasticsearch Platform — Find real-time answers at scale | Elastic

in particular the phrase "Term facet causes the relevant field values
to be loaded into memory. " (maybe because of my bad command of
English )

If I have 100 million documents, each document having a field 'field1'
of type 'byte', and if only 30 different values (or terms) are ever
used across all the documents, will a term facet search on the
'field1' load into RAM

a) 100 million times one byte, (relevant=all the values)
b) or just 30 times one byte, (relevant=all the distinct values)
c) or just the N<=30 times one byte where N is the number of documents
matching the facet filter? (relevant=all the distinct values that one
can possibly return with the filters)

Also, is there a way to get approximative counts per term, but with
less memory use and/or that could be used when the count per term is
really large?
BTW is 1<<32=4294967296 the maximum count one can get, or does the
count use a float?

Thanks
TuXRaceR

--

--

--

Topic		Replies	Views
Consolidate facet search knowledge about memory usage Elasticsearch	6	372	July 6, 2017
How does the memory usage for terms facets work? Elasticsearch	7	422	July 6, 2017
Yet another facet/memory question Elasticsearch	2	348	July 6, 2017
Estimating field cache size for facets in advance Elasticsearch	11	474	July 6, 2017
Terms Faceting on multi-valued field Elasticsearch	4	837	July 6, 2017

Term facet memory consideration in the documentation

HTH

HTH

Related topics