Term filter length on performance and Prefix filter

arthurx · April 29, 2014, 4:30am

How is the Prefix filter compared to Term filter performance-wise?

For example,
matching "o1382334"
with prefix filter of "o" only need to check the first letter, whereas
using term filter need to match "o1382334" eight letters. Assume they match
equal number of documents, it seems that prefix filter is more
"lightweight".

Would someone having an understanding of the underlying algorithm of how
term filter works comment on this?

A related problem is: is it good for performance to use very long string as
term filter, like a md5 hash instead of a simple integer ID?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/cc2f2637-42fd-49b3-9ef9-2560f87a9cc2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jpountz · April 29, 2014, 10:09am

Thanks to the inverted index, terms are only looked up once per segment. So
I don't think the number of characters to compare would have any
performance impact.

One benefit of the term filter though might be that the terms dictionary
index can know that it is not contained in the terms dictionary without
going to disk.

On Tue, Apr 29, 2014 at 6:30 AM, arthurX fc28222@gmail.com wrote:

How is the Prefix filter compared to Term filter performance-wise?

For example,
matching "o1382334"
with prefix filter of "o" only need to check the first letter, whereas
using term filter need to match "o1382334" eight letters. Assume they
match equal number of documents, it seems that prefix filter is more
"lightweight".

Would someone having an understanding of the underlying algorithm of how
term filter works comment on this?

A related problem is: is it good for performance to use very long string
as term filter, like a md5 hash instead of a simple integer ID?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/cc2f2637-42fd-49b3-9ef9-2560f87a9cc2%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/cc2f2637-42fd-49b3-9ef9-2560f87a9cc2%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j7PVkFfpoTHMRNXq2YgnE%3D%2B32kEbpgp4mOToYAY6%3DQE7A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Term filter: integer vs string? Elasticsearch	2	2637	July 6, 2017
Terms Query performance with increased number of terms Elasticsearch	3	1683	December 6, 2019
Potential memory overhead of the "Term Lookup" mechanism Elasticsearch	1	341	July 6, 2017
Performance based on length of "terms" Elasticsearch	2	259	July 6, 2017
Match filter vs terms filter? Elasticsearch	1	344	July 6, 2017

Term filter length on performance and Prefix filter

Related topics