"totalTermFreq must be at least docFreq" error after upgrading to 7.0.1

After upgrading from 7.0.0 to 7.0.1 my search query gives totalTermFreq must be at least docFreq.

  root_cause: [
    {
      type: 'illegal_argument_exception',
      reason: 'totaltermfreq must be at least docfreq, totaltermfreq: 118, docfreq: 1370'
    }
  ],
  type: 'search_phase_execution_exception',
  reason: 'all shards failed',
  phase: 'query',
  grouped: true,
  failed_shards: [ REMOVED ],
  caused_by: { REMOVED },
}

What does it mean and how can I fix it? Something to do with my index after upgrading? It was working fine before upgrade.

We are having the same issues upgrading as well.

Our queries failing are all multi-match queries using cross_fields, they work if we do best_fields, or most_fields. (However neither of those produce the search accuracy we need).

We re-indexed one of the problem indexes with a fresh 7.0.1 index, and that did not resolve the issue.

The query also uses multiple fields, with varying scores per field. So like:

"contacts.firstname.english^1.5",
"contacts.firstname.ngram^1.0",

If we mess with those scores, we can reduce the chance of that error showing up. IE this search can hit over 80 indexes, in one example it was failing on 57 shards. When we adjusted the score of name down, it dropped the failure down to about 30 shards.

We can't find a way to fix this at the query level, and believe it's a elasticsearch bug.

Interesting that we seem to be the only people with this issue. Are you getting the exact same error message as me? And did it happen from upgrading 7.0.0 to 7.0.1?

You seem to have a lot more knowledge about the issue than me, do you think you could create a github issue and include the various things you have tried and discovered about this issue? I'm relatively a newbie so don't have much info to provide in an issue other than what I wrote above.

No you are not the only ones having this problem. I encountered the same problem yesterday after upgrading our test cluster from 6.7.1 to 7.0.1.
I also deleted the old affected indices an recreated them but with the same result.

At the moment I'm also looking for an explanation and a solution.

I also search using multi-match queries with cross_fields. Changing it to best_fields also works.

In my environment it seems that array fields are causing the problem.

I've got a mapping like:

{
	"properties": {
		"description": {
			"type": "text",
			"similarity": "custom_similarity",
			"term_vector" : "with_positions_offsets",
			"analyzer": "standard_analyzer",
			"search_analyzer": "standard_search_analyzer",
			"fields": {
				"ngram": {
					"type": "text",
					"similarity": "custom_similarity",
					"analyzer": "ngram_analyzer",
					"search_analyzer": "standard_search_analyzer"
				},
				"edge_ngram_prefix": {
					"type": "text",
					"similarity": "custom_similarity",
					"analyzer": "edge_ngram_1_analyzer",
					"search_analyzer": "standard_search_analyzer"
				}
			}
		},
		"tags": {
			"type": "text",
			"similarity": "custom_similarity",
			"analyzer": "standard_analyzer",
			"search_analyzer": "standard_search_analyzer",
			"fields": {
				"keyword": {
					"type": "keyword",
					"ignore_above": 256
				}
			}
		}
	}
}

And I post data like:

POST /some_index/_doc
{
	"description": "Some description",
	"tags": ["foo", "bar", "foo bar"]
}

When I search in both fields the query fails.

The query looks like:

{  
   "from":0,
   "size":100,
   "query":{  
      "bool":{  
         "must":[  
            {  
               "function_score":{  
                  "query":{  
                     "bool":{  
                        "must":[  
                           {  
                              "function_score":{  
                                 "query":{  
                                    "multi_match":{  
                                       "query":"foo",
                                       "fields":[  
                                          "description^3.0",
                                          "description.edge_ngram_prefix^0.90000004",
                                          "description.ngram^0.6",
                                          "tags^1.0"
                                       ],
                                       "type":"cross_fields",
                                       "operator":"AND",
                                       "slop":0,
                                       "prefix_length":0,
                                       "max_expansions":50,
                                       "tie_breaker":0.05,
                                       "zero_terms_query":"NONE",
                                       "auto_generate_synonyms_phrase_query":true,
                                       "fuzzy_transpositions":true,
                                       "boost":1.0
                                    }
                                 },
                                 "functions":[  
                                    {  
                                       "filter":{  
                                          "match_all":{  
                                             "boost":1.0
                                          }
                                       },
                                       "field_value_factor":{  
                                          "field":"boost",
                                          "factor":1.0,
                                          "modifier":"none"
                                       }
                                    }
                                 ],
                                 "score_mode":"sum",
                                 "max_boost":3.4028235E38,
                                 "boost":1.0
                              }
                           }
                        ],
                        "adjust_pure_negative":true,
                        "boost":1.0
                     }
                  }
               }
            }
         ],
         "adjust_pure_negative":true,
         "boost":1.0
      }
   }
}

Unfortunately I was not yet able to reproduce the problem in a simple example. By now it only happens when I index tons of data in my index.

I've removed all those array fields from the query now. And I do no longer get this error.

The query looks fine, and removing those fields isn't a solution. This is almost certainly a bug and one that has come up in a minor release.

I'm really surprised nobody from the elasticsearch team has cared to comment.

Can you add those details to a github issue at https://github.com/elastic/elasticsearch/issues/? The details you've shared will be useful for them to determine the cause.

Done: https://github.com/elastic/elasticsearch/issues/41934

They have fixed the problem. See pull request: https://github.com/elastic/elasticsearch/pull/41938

Thanks ChrisSto for submitting the issue, and getting a test case that you can reproduce.

Glad to see a PR already for fixing it.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.