It seems that require_field_match highlight option does not work

Maybe I've missed something but it seems to me that highlight option require_field_match does not work correctly (ES 7.9.3 but I tested it with older versions also).

Very simple example follows:

Let's have following index with one document

POST test_index/_doc
{
  "title": "just one",
  "text": "something one two something else three"
}

I want to get documents where are all words "one, two, three" in text or all these words in title. It could be done in multiple ways but to avoid ambiguity let's use query_string.query. And of-course I want to get appropriate highlights to know why a document was matched.

So using following query the document is matched

GET test_index/_search
{
  "highlight": {
    "fields": {
      "title": {},
      "text": {}
    }
  },
  "query": {
    "query_string": {
      "query": "title:(one AND two AND three) OR text:(one AND two AND three)"
    }
  }
}

But I get these highlights

"highlight" : {
    "text" : [
      "something <em>one</em> <em>two</em> something else <em>three</em>"
    ],
    "title" : [
      "just <em>one</em>"
    ]
}

I don't want to get title field highlighted this way because this field doen't match the query. Until now I've thought that this situation is the reason why parameter require_field_match exists so I try query:

GET test_index/_search
{
  "highlight": {
    "fields": {
      "title": {
        "require_field_match": "true"
      },
      "text": {
        "require_field_match": "true"
      }
    }
  },
  "query": {
    "query_string": {
      "query": "title:(one AND two AND three) OR text:(one AND two AND three)"
    }
  }
}

But I get exactly same results with highligts for both text and title field. I am expecting to get just highlight for field text (because field title contains only one of those three words)

Does anybody know why it doesn't work as expected (even if the query is so simple) and how to get more precise highlights.

Thanks,
Zdenek

Term highlighting is typically based on bags of words, not any careful consideration of AND/OR logic.

All require_field_match:false does is relax the constraint that a word has to be found in the same field the query clause used.

It's impossible to satisfy all of the objectives in a highlighter (summarising long texts, respecting all query logic, favouring high-value words over common ones, being fast etc). This is why different people at different times have taken up the challenge of writing a better highlighter with different degrees of success and why elasticsearch has a number of implementations to choose from.

The description from Elasticsearch documentation is quite unfortunate though

require_field_match
By default, only fields that contains a query match are highlighted. Set require_field_match to false to highlight all fields. Defaults to true.

From your description it seems to me that fields that contains any word contained anywehere in query are returned even though they does not match the query.

And for plain highliter there is written folowing text

To accurately reflect query logic, it creates a tiny in-memory index and re-runs the original query criteria through Lucene’s query execution planner to get access to low-level match information for the current document. This is repeated for every field and every document that needs to be highlighted.

It sounds to me havy enough to simply distinguish whether the field matches the query or not.

I wrote the original plain highlighter some 20 years ago.
Back then it used Query.extractTerms to get a list of terms but that changed last year

I understand there's more work planned to expose match information from Lucene for better highlighting but generally I wouldn't have high expectations about full representation of Boolean logic in highlighters.

Thanks :slight_smile:

Do you have any idea how to solve following scenario (it seems to me like nothing special)?

I have documents with Title and Text field (you can imagine this as a newspaper article with headline and text). I want to let users search in documents in an user-friendly Google-like way (that's why I prefer query_string.query) and as a result I want to display highlighted Title and Text.

Do you see any alternative way how to highlight each field only if the field matches the query (not if it contains just one word from the query)?

It is still hard to believe for me that it is not possible.

It's not a problem we've worked hard on solving because it's not a given users want it to work the way you describe.

"query": "title:(one AND two AND three) OR text:(one AND two AND three)"

All the documents will be guaranteed to match the query. Even if a title does not contain ALL the required terms the text field will and it is often useful to see any also-ran partial matches in the title highlighted.
The "named" query feature provides a way to find out which subclauses matched in a document so you might be able to use that to figure out if the title was genuine match or not and then use the highlight or _source original value accordingly.

I still do think that the description of require_field_match in documentation is a bit deceptive but as you write, it is because of my specific point of view :slight_smile: Never mind.

Thank you a lot for your time and for the hint for using named query as a workaroud. I believe it will fit nicely into our scenario.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.