Highlighting and text_expansion query

Mark_Harwood1 · May 30, 2023, 11:58am

Playing with the new ELSER model and the text_expansion query in 8.8 which looks to be matching OK. Now I want end users to understand why documents matched but can't get highlighting to work. Does it?
I've tried setting require_field_match to 'false' on the highlighter and targeting the text field but can't get it to highlight anything.

I had a quick poke around the code and made the following findings:
The low-level 'explain' descriptions look like the outputs of something other than regular 'term' queries:

"Linear function on the expanded_text.predicted_value field for the xxxx feature, computed as w * S from:",

.. which I assume is why they don't highlight. The new text expansion query looks to be asking for regular term queries but the target 'rank features field' turns these into FeatureQuery objects which I guess are unknown to existing highlighters.
Are the boosts for each term expansion using their relevance to the query (a weight computed on-the-fly) or the relevance to the document (stored in the feature field) or a mix of both?

Tom_Veasey · June 2, 2023, 11:01pm

Hey Mark!

This is a known limitation and is something we will look to address asap.

First a little background. The term weights are computed by a function of the embedding vectors generated by the text tokens. Each token can generate a weight for any term, but only the token with the maximum weight is used to compute the document score, i.e. we multiply max token weights for the query and document expansion for matching terms and add these to the score. Note in this context tokens are the BERT vocab tokens, i.e. words or word pieces. This is handy because there is a many-to-1 relationship between tokens and words and we can use this to compute the contribution to the match score for each word in the query and doc. The plan is to highlight top scoring words.

The reason we haven't done this yet is we need to update our inference service to output the maximum weight token for each term and we want to do this in a way which allows for more general model inputs and outputs.

Mark_Harwood1 · June 3, 2023, 2:04pm

Hi Tom!
Many thanks for the very detailed response.
That makes sense and I appreciate this is early days so some stuff might not be there yet.
Cheers.

system · July 1, 2023, 2:05pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Highlighting feature issue in case of complex query Elasticsearch	2	315	July 6, 2017
Experimental highlighter plugin 1.3.0, 1.4.2, and 1.5.0 released Community Ecosystem	1	1291	July 5, 2017
Mapping the query to the highlight using FVH Elasticsearch	3	1490	January 4, 2021
Highlight on Phrase Prefix Text Query Elasticsearch	3	546	July 6, 2017
[ANN] Experimental highlighter 0.0.10 released Elasticsearch	1	335	July 6, 2017

Highlighting and text_expansion query

Related topics