Elasticsearch Highlight the result of script fields

RabBit_BR · October 9, 2022, 1:15pm

I thought of another solution. You could index two fields, the original html and the html_extract which has only the text.
You would have to use a processor to just index the text coming from the message and highligths would work.

Mapping

PUT idx_html_strip
{
  "mappings": {
    "properties": {
      "html": {
        "type": "text"
      },
      "html_extract": {
        "type": "text"
      }
    }
  }
}

Processor Pipeline

PUT /_ingest/pipeline/pipe_html_strip
{
  "description": "_description",
  "processors": [
    {
      "html_strip": {
        "field": "html",
        "target_field": "html_extract"
      }
    },
    {
      "script": {
        "lang": "painless",
        "source": "ctx['html_raw'] = ctx['html_raw'].replace('\n',' ').trim()"
      }
    }
  ]
}

Index Data

Note the use ?pipeline=pipe_html_strip

POST idx_html_strip/_doc?pipeline=pipe_html_strip
{
  "html": """<html><body><h1 style=\"font-family: Arial\">Test</h1> <span><strong>More</strong> test</span></body></html>"""
}

Query

GET idx_html_strip/_search?filter_path=hits.hits._source,hits.hits.highlight
{
  "query": {
    "multi_match": {
      "query": "More",
      "fields": ["html", "html_extract"]
    }
  },"highlight": {
    "fields": {
      "*":{ "pre_tags" : ["<strong>"], "post_tags" : ["</strong>"] }
    }
  }
}

Results

{
  "hits": {
    "hits": [
      {
        "_source": {
          "html": """<html><body><h1 style=\"font-family: Arial\">Test</h1> <span><strong>More</strong> test</span></body></html>""",
          "html_extract": "Test More test"
        },
        "highlight": {
          "html": [
            """<html><body><h1 style=\"font-family: Arial\">Test</h1> <span><strong><strong>More</strong></strong> test</span></body>"""
          ],
          "html_extract": [
            "Test <strong>More</strong> test"
          ]
        }
      }
    ]
  }
}

Topic		Replies	Views
Html stripped highlighted text from html Content field Elasticsearch	9	2970	July 6, 2017
Highlight fragments of fields that use the html_strip char filter still contain HTML tags Elasticsearch	4	74	August 27, 2024
Highlighting leads to html tags overlap Elasticsearch	5	3223	September 14, 2018
Order of operations wrt highlight and a script Elasticsearch	2	733	July 5, 2017
Highlight does not work properly with ScriptScore, version 7.8 Elasticsearch	1	407	August 14, 2020

Elasticsearch Highlight the result of script fields

Related topics