How is the Prefix filter compared to Term filter performance-wise?
For example,
matching "o1382334"
with prefix filter of "o" only need to check the first letter, whereas
using term filter need to match "o1382334" eight letters. Assume they match
equal number of documents, it seems that prefix filter is more
"lightweight".
Would someone having an understanding of the underlying algorithm of how
term filter works comment on this?
A related problem is: is it good for performance to use very long string as
term filter, like a md5 hash instead of a simple integer ID?
Thanks to the inverted index, terms are only looked up once per segment. So
I don't think the number of characters to compare would have any
performance impact.
One benefit of the term filter though might be that the terms dictionary
index can know that it is not contained in the terms dictionary without
going to disk.
How is the Prefix filter compared to Term filter performance-wise?
For example,
matching "o1382334"
with prefix filter of "o" only need to check the first letter, whereas
using term filter need to match "o1382334" eight letters. Assume they
match equal number of documents, it seems that prefix filter is more
"lightweight".
Would someone having an understanding of the underlying algorithm of how
term filter works comment on this?
A related problem is: is it good for performance to use very long string
as term filter, like a md5 hash instead of a simple integer ID?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.