What's the difference between terms and term(with bool should)?

yeziblo · September 27, 2021, 3:26am

I've always wondered, for example, about the following statements.
A:

{
"bool": {
"should": [
{
"term": {
"userType": {
"value": 0
}
}
},
{
"term": {
"userType": {
"value": 1
}
}
},
{
"term": {
"userType": {
"value": 2
}
}
},
{
"term": {
"userType": {
"value": 3
}
}
},
{
"term": {
"userType": {
"value": 4
}
}
},
{
"term": {
"userType": {
"value": 5
}
}
}，
...
]
}
}

B:

{
"terms": {
"userType": [
0,
1,
2,
3,
4,
5,
...
]
}
}

Which of them is going to be faster? Is speed also related to the number of terms?

Mark_Harwood · September 27, 2021, 8:19am

There's a few differences.

The simplest to see is the verbosity - terms queries just list an array while term queries require more JSON.
terms queries do not score matches based on IDF (the rareness) of matched terms - the term query does.
term queries can only have up to 1024 values due to Boolean's max clause count
terms queries can have more terms

By default, Elasticsearch limits the terms query to a maximum of 65,536 terms. You can change this limit using the index.max_terms_count setting.

Which of them is going to be faster? Is speed also related to the number of terms?

It depends. They execute differently. term queries do more expensive scoring but does so lazily. They may "skip" over docs during execution because other more selective criteria may advance the stream of matching docs considered.
The terms queries doesn't do expensive scoring but is more eager and creates the equivalent of a single bitset with a one or zero for every doc by ORing all the potential matching docs up front. Many terms can share the same bitset which is what provides the scalability in term numbers.

yeziblo · September 27, 2021, 8:56am

It seems like the term query will do scoring, but probably queries for fewer documents, and the terms query does not do scoring, but always queries for all documents that meet the conditions.

I don't quite understand, could you explain it more clearly or give me some relevant documents?

Thank you very much

Mark_Harwood · September 27, 2021, 9:37am

Each term has an ordered "postings list" of documents e.g.

foo -> 1, 7, 9, 12, 32, 44, ......
bar -> 41,  44, 99

If your query has a must clause for bar and the foo is optional then a bool query can tell the optional foo clause to efficiently skip reading over anything less than 41. The lists of doc ids are encoded in blocks that support skipping over whole chunks of these posting lists.
Also - if track_total_hits is false then non-competitive scoring documents can be skipped over. When bar is rare and we've already matched enough high-quality docs then the bool query can stop asking for more foo matches if that is a boring common word because we know these matches aren't going to be interesting enough to make the top results.

yeziblo · September 28, 2021, 8:50am

Thanks Mark, It's really helpfull!

system · October 26, 2021, 8:50am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Terms performance vs many should clauses Elasticsearch	3	989	December 19, 2017
Difference between bool-should-match vs terms query on keyword/numeric field Elasticsearch	2	423	August 28, 2019
Better to use "must" and "terms", or "should", in a bool query? Elasticsearch	2	5471	July 6, 2017
[Filter Context] terms query vs should query Elasticsearch	1	622	May 17, 2019
Terms vs should Elasticsearch	1	1655	May 26, 2017

What's the difference between terms and term(with bool should)?

Related topics