What's the difference between terms and term(with bool should)?

I've always wondered, for example, about the following statements.
A:

{
"bool": {
"should": [
{
"term": {
"userType": {
"value": 0
}
}
},
{
"term": {
"userType": {
"value": 1
}
}
},
{
"term": {
"userType": {
"value": 2
}
}
},
{
"term": {
"userType": {
"value": 3
}
}
},
{
"term": {
"userType": {
"value": 4
}
}
},
{
"term": {
"userType": {
"value": 5
}
}
},
...
]
}
}

B:

{
"terms": {
"userType": [
0,
1,
2,
3,
4,
5,
...
]
}
}

Which of them is going to be faster? Is speed also related to the number of terms?

There's a few differences.

  • The simplest to see is the verbosity - terms queries just list an array while term queries require more JSON.
  • terms queries do not score matches based on IDF (the rareness) of matched terms - the term query does.
  • term queries can only have up to 1024 values due to Boolean's max clause count
  • terms queries can have more terms

By default, Elasticsearch limits the terms query to a maximum of 65,536 terms. You can change this limit using the index.max_terms_count setting.

Which of them is going to be faster? Is speed also related to the number of terms?

It depends. They execute differently. term queries do more expensive scoring but does so lazily. They may "skip" over docs during execution because other more selective criteria may advance the stream of matching docs considered.
The terms queries doesn't do expensive scoring but is more eager and creates the equivalent of a single bitset with a one or zero for every doc by ORing all the potential matching docs up front. Many terms can share the same bitset which is what provides the scalability in term numbers.

It seems like the term query will do scoring, but probably queries for fewer documents, and the terms query does not do scoring, but always queries for all documents that meet the conditions.

I don't quite understand, could you explain it more clearly or give me some relevant documents?

Thank you very much :slight_smile:

Each term has an ordered "postings list" of documents e.g.

foo -> 1, 7, 9, 12, 32, 44, ......
bar -> 41,  44, 99

If your query has a must clause for bar and the foo is optional then a bool query can tell the optional foo clause to efficiently skip reading over anything less than 41. The lists of doc ids are encoded in blocks that support skipping over whole chunks of these posting lists.
Also - if track_total_hits is false then non-competitive scoring documents can be skipped over. When bar is rare and we've already matched enough high-quality docs then the bool query can stop asking for more foo matches if that is a boring common word because we know these matches aren't going to be interesting enough to make the top results.

1 Like

Thanks Mark, It's really helpfull!

1 Like