Elasticsearch Highlighting Issue with fragmentor


(Nikesh) #1

I am finding issues with how Elasticsearch highlights the text
Example:

  1. when using "unified" highlighter i.e.,

    "highlight": {
    "type" : "unified",
    "fields": {
    "*": {}
    }

the results shown is :

"highlight": {
                    "FILE_CONTENT": [
                        "follows: \n1)Test Clause\n1.1) test 29_May\nDummy values - \nCurrency is - INR  \nContract Category is - <em>Ball</em>",
                        "<em>Bearings</em> \nStart Date is - 05/31/2018 \nhgdhdrgdffggggggggggggggggggggggggggggggggggggggggggggggjerklwagggggggggggggggggggggggggggggggggg"
                    ]

we can see Ball and bearing are highlighted together but shown as different elements of highlight array.
2. The same when using "fvh" highlighter solves the issue :

"highlight": {
        "type" : "fvh",
        
        "fields": {
            "*": {}
        }
    }

Highlighting for this is :

 "highlight": {
                    "FILE_CONTENT": [
                        "Currency is - INR  \nContract Category is - <em>Ball Bearings</em> \nStart Date is - 05/31/2018 \nhgdhdrgdffgggg"
                    ]
                }

For some speed issues, i can not able to use Fvh highlighter. I am restricted to use unified highlighter
According to Elasticsearch reference module "fragmentor":"span" should solve this issue. But i am getting the same issue as when not using "fragmentor" : "span".

"highlight": {
        "type" : "unified",
        "fragmenter": "span",
        "fields": {
            "*": {}
        }
    }

The results are:

"highlight": {
 "FILE_CONTENT": [
                        "follows: \n1)Test Clause\n1.1) test 29_May\nDummy values - \nCurrency is - INR  \nContract Category is - <em>Ball</em>",
                        "<em>Bearings</em> \nStart Date is - 05/31/2018 \nhgdhdrgdffggggggggggggggggggggggggggggggggggggggggggggggjerklwagggggggggggggggggggggggggggggggggg"
                    ] 
}

(Nikesh) #2

@elastic Please provide me any solution to this if it exists.