Highlighting not working correctly with greek analyzer

Hello everyone,

I have a problem with the highlighting in Elasticsearch. I have a field in an index like the following:

"skepseis": {
	"type": "nested", 
	"properties": {
	  "text": {
		"type": "text",
		"analyzer": "greek"
	  }
	}
}

When the user searches with the terms "word1 word2" (where word1 and word2 are parts of the complete words) the correct document returns but the ES highlights only one of the words.

For example, when the user searches for "ΣΥΜΒΟΥΛ ΕΠΙΚΡΑΤ" (instead of "ΣΥΜΒΟΥΛΙΟ ΕΠΙΚΡΑΤΕΙΑΣ"), the ES highlights only the ΣΥΜΒΟΥΛΙΟ and not the ΕΠΙΚΡΑΤΕΙΑΣ.

I think that the problem has to do with the stemming of the two words. But i do not know how to change things to solve this.

The relevant code for highlighting is:

"inner_hits": {
	"name": f"matching_skepseis",
	"size": 10,               
	"highlight": {         
		"fields": {
			"skepseis.text": {"type": "unified"}
		}
	}
}

Any suggestions?

Using _analyze API, you can check how your original document gets analyzed:

GET <your_index>/_analyze
{
  "analyzer": "greek",
  "text": "ΣΥΜΒΟΥΛΙΟ ΕΠΙΚΡΑΤΕΙΑΣ"
}

Do you see different outputs from running analyze API on "ΣΥΜΒΟΥΛ ΕΠΙΚΡΑΤ?

If yes, then you should adjust your analyzer to produce the same output for both variations.

1 Like

Yes they produce different tokens. Do you have any idea how to adjust the analyzer? I use the default greek one.

Thank you for your help.

You can NOT adjust greek analyzer; that is how it does stemming.

Looks like you are searching with prefixes. Consider using “search_as_you_type” field type instead, that allows to search using prefixes.

"skepseis2": {
        "type": "nested",
        "properties": {
          "text": {
            "type": "search_as_you_type",
            "analyzer": "greek"
          }
        }
      }

Then you query could be:

GET /_search
{
  "query": {
    "nested": {
      "path": "skepseis2",
      "query": {
        "multi_match": {
          "query": "ΣΥΜΒΟΥΛ ΕΠΙΚΡΑΤ",
          "type": "bool_prefix",
          "fields": [
            "skepseis2.text",
            "skepseis2.text._2gram",
            "skepseis2.text._3gram"
          ]
        }
      },
      "inner_hits": {
        "name": "matching_skepseis2",
        "size": 10,
        "highlight": {
          "fields": {
            "skepseis2.text": {
              "type": "unified"
            }
          }
        }
      }
    }
  }
}

Thank you very much for your reply. Hope this solve the problem.

Unfortunately the type of the field (search_as_you_type), due to the its length i suppose, it returns error during ingestion:

TransportError(429, 'circuit_breaking_exception', '[parent] Data too large, data for [<http_request>] would be [2090148108/1.9gb], which is larger than the limit of [2040109465/1.8gb], real usage: [2088466136/1.9gb], new bytes reserved: [1681972/1.6mb], usages [request=416530432/397.2mb, fielddata=4958016/4.7mb, in_flight_requests=1681972/1.6mb, model_inference=0/0b, accounting=2335104/2.2mb]')