What might be wrong with this attempt to highlight ES results?

I have a working version of an Elasticsearch 7.10.2 setup where the stemmed query results are highlighted in beautiful multiple colours. So if you have 4 words in your search query the results are delivered with the words in the results corresponding to the 4 different terms ... highlighted in 4 different colours.

ES v 7.16.8 (if memory serves...) broke this and I raised an issue with ES HQ, which in fact turned out to be a regression which others had spotted in relation to other problems. Later the problem was said to have been put right.

I'm now trying to get ES 8.6.2 to do this same thing (running on port 9500).

This is my mapping, applied to a new index before populating it (other fields omitted):

    mappings = {
        'properties': {
            'text_content': {
                'type': 'text',
                'term_vector': 'with_positions_offsets',
                'fields': {
                    'stemmed': {
                        'type': 'text',
                        'analyzer': 'english',
                        'term_vector': 'with_positions_offsets',
                    }
                }
            }
        }
    }

And this is how I run the query after populating the index:

    data = {
        'query': {
            'simple_query_string': {
                'query': self.query_text,
                'fields': ['text_content.stemmed'] 
            }
        },
        'highlight': {
            'fields': {
                'text_content.stemmed': {
                    'type': 'fvh',
                    'pre_tags': [
                        '<span style="background-color: yellow">',
                        '<span style="background-color: skyblue">', 
                        '<span style="background-color: lightgreen">', 
                        '<span style="background-color: plum">', 
                        '<span style="background-color: lightcoral">', 
                        '<span style="background-color: silver">',
                    ],
                    'post_tags': ['</span>', '</span>', '</span>', 
                        '</span>', '</span>', '</span>',]
                }
            },
            'number_of_fragments': 0
        }
    }        
    search_url = f'{ES_URL}/{ALIAS_NAME}/_search'
    headers = {'Content-type': 'application/json'}
    success, deliverable = utilities.process_json_request(search_url, data=json.dumps(data), headers=headers)

This currently fails with:

    request failed. URL |https://localhost:9500/dev_my_documents/_search| command get deliverable.failure_reason unacceptable status code: 400
    reason: [1:501] [highlight] failed to parse field [fields]

Examining the failure response's json() I get this:

    {
      "error": {
        "root_cause": [
          {
            "type": "x_content_parse_exception",
            "reason": "[1:500] [highlight_field] failed to parse field [post_tags]"
          }
        ],
        "type": "x_content_parse_exception",
        "reason": "[1:500] [highlight] failed to parse field [fields]",
        "caused_by": {
          "type": "x_content_parse_exception",
          "reason": "[1:500] [fields] failed to parse field [text_content.stemmed]",
          "caused_by": {
            "type": "x_content_parse_exception",
            "reason": "[1:500] [highlight_field] failed to parse field [post_tags]",
            "caused_by": {
              "type": "json_e_o_f_exception",
              "reason": "Unexpected end-of-input in VALUE_STRING\n at [Source: (org.elasticsearch.common.io.stream.ByteBufferStreamInput); line: 1, column: 504]"
            }
          }
        }
      },
      "status": 400
    }

This is the identical mapping and query searching which works with my ES 7.10.2. Can anyone explain what I'm doing wrong now and how to implement the multi-colour highlighting correctly with 8.6.2?

For clarification: if I comment out the whole "highlight" key/value in data, the query runs fine. With no highlighting at all, obviously.

I would recommend testing with the latest release to check if it has indeed been fixed.

@Christian_Dahlqvist
Thanks. I'm downloading 8.13 now.

But the matter was given quite serious attention at the time, 2022-03: here is the issue in github.

By the way (can't any longer seem to edit my original post), the following works fine:

            'query': {
                'simple_query_string': {
                    'query': self.query_text,
                    'fields': ['text_content.stemmed'] 
                }
            },
            'highlight': {
                'fields': {
                    'text_content.stemmed': {"pre_tags" : ["<strong>"], "post_tags" : ["</strong>"]},
                },
            }

... this delivers perfectly respectable (monochrome) highlighted HTML markup of the stem-matched results.

Of course it may be that the regression I found was somehow related to the one others had been talking about ... but not addressed by the fix for some reason.

I did find that issue but it is not labelled with a version. It seems it was fixed long ago enough to have been included in the version you were using, but it is always recommended to verify with the latest version.

I've installed 8.13.1 (with security disabled..., so I'm doing http://localhost:9500, not https).

Slightly different errors now when attempting to use the fvh highlighter and multi-colour highlighting:

request failed. URL |http://localhost:9500/dev_my_documents/_search| command get deliverable.failure_reason unacceptable status code: 400
reason: [1:495] [highlight] failed to parse field [fields]

and the json from the request failure response:

{
  "error": {
    "root_cause": [
      {
        "type": "x_content_parse_exception",
        "reason": "[1:495] [highlight_field] failed to parse field [post_tags]"
      }
    ],
    "type": "x_content_parse_exception",
    "reason": "[1:495] [highlight] failed to parse field [fields]",
    "caused_by": {
      "type": "x_content_parse_exception",
      "reason": "[1:495] [fields] failed to parse field [text_content.stemmed]",
      "caused_by": {
        "type": "x_content_parse_exception",
        "reason": "[1:495] [highlight_field] failed to parse field [post_tags]",
        "caused_by": {
          "type": "json_e_o_f_exception",
          "reason": "Unexpected end-of-input in VALUE_STRING\n at [Source: (byte[])\"{\"query\": {\"simple_query_string\": {\"query\": \"kill process run\", \"fields\": [\"text_content.stemmed\"]}}, \"highlight\": {\"fields\": {\"text_content.stemmed\": {\"type\": \"fvh\", \"pre_tags\": [\"<span style=\\\"background-color: yellow\\\">\", \"<span style=\\\"background-color: skyblue\\\">\", \"<span style=\\\"background-color: lightgreen\\\">\", \"<span style=\\\"background-color: plum\\\">\", \"<span style=\\\"background-color: lightcoral\\\">\", \"<span style=\\\"background-color: silver\\\">\"], \"post_tags\": [\"</span>\", \"</span>\", \"</spa\"[truncated 3 bytes]; line: 1, column: 504]"
        }
      }
    }
  },
  "status": 400
}

"truncated 3 bytes" ... hmmm. The plot thickens.

I tried replacing the single-quotes in the "pre-tags" with double-quotes, and backslash-escaping the double-quotes within the "span" strings. Same error.

To me it looks like what's being reported might be some low-level error due to some kind of unhappiness with the json string submitted as data in the request...

Also, text_content.stemmed is (as I understand it) not a "stored" field, just a "searchable" one: I wonder if that might have something to do with it? It's just that things work fine in the 7.10.2 version, so this seems unlikely.

Madness. I finally found the solution: using single-quotes (in Python) in constructing the query dict was the problem!

I replaced these with double-quotes and the problem has gone away. Quite strange.