in particular the phrase "Term facet causes the relevant field values to
be loaded into memory. " (maybe because of my bad command of English )
If I have 100 million documents, each document having a field 'field1'
of type 'byte', and if only 30 different values (or terms) are ever used
across all the documents, will a term facet search on the 'field1' load
into RAM
a) 100 million times one byte, (relevant=all the values)
b) or just 30 times one byte, (relevant=all the distinct values)
c) or just the N<=30 times one byte where N is the number of documents
matching the facet filter? (relevant=all the distinct values that one
can possibly return with the filters)
Also, is there a way to get approximative counts per term, but with less
memory use and/or that could be used when the count per term is really
large?
BTW is 1<<32=4294967296 the maximum count one can get, or does the count
use a float?
in particular the phrase "Term facet causes the relevant field values to be loaded into memory. " (maybe because of my bad command of English )
If I have 100 million documents, each document having a field 'field1' of type 'byte', and if only 30 different values (or terms) are ever used across all the documents, will a term facet search on the 'field1' load into RAM
a) 100 million times one byte, (relevant=all the values)
b) or just 30 times one byte, (relevant=all the distinct values)
c) or just the N<=30 times one byte where N is the number of documents matching the facet filter? (relevant=all the distinct values that one can possibly return with the filters)
Also, is there a way to get approximative counts per term, but with less memory use and/or that could be used when the count per term is really large?
BTW is 1<<32=4294967296 the maximum count one can get, or does the count use a float?
I am not sure if you can combine this plugin with a facet 'filter'. I.e
not do only a count by date, but do a count by date for documents
matching a condition (e.g documents belonging to a user)
in particular the phrase "Term facet causes the relevant field values
to be loaded into memory. " (maybe because of my bad command of
English )
If I have 100 million documents, each document having a field 'field1'
of type 'byte', and if only 30 different values (or terms) are ever
used across all the documents, will a term facet search on the
'field1' load into RAM
a) 100 million times one byte, (relevant=all the values)
b) or just 30 times one byte, (relevant=all the distinct values)
c) or just the N<=30 times one byte where N is the number of documents
matching the facet filter? (relevant=all the distinct values that one
can possibly return with the filters)
Also, is there a way to get approximative counts per term, but with
less memory use and/or that could be used when the count per term is
really large?
BTW is 1<<32=4294967296 the maximum count one can get, or does the
count use a float?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.