Elasticsearch Highlighting Issue with fragmentor


(Nikesh) #1

I am finding issues with how Elasticsearch highlights the text
Example:

  1. when using "unified" highlighter i.e.,

    "highlight": {
    "type" : "unified",
    "fields": {
    "*": {}
    }

the results shown is :

"highlight": {
                    "FILE_CONTENT": [
                        "follows: \n1)Test Clause\n1.1) test 29_May\nDummy values - \nCurrency is - INR  \nContract Category is - <em>Ball</em>",
                        "<em>Bearings</em> \nStart Date is - 05/31/2018 \nhgdhdrgdffggggggggggggggggggggggggggggggggggggggggggggggjerklwagggggggggggggggggggggggggggggggggg"
                    ]

we can see Ball and bearing are highlighted together but shown as different elements of highlight array.
2. The same when using "fvh" highlighter solves the issue :

"highlight": {
        "type" : "fvh",
        
        "fields": {
            "*": {}
        }
    }

Highlighting for this is :

 "highlight": {
                    "FILE_CONTENT": [
                        "Currency is - INR  \nContract Category is - <em>Ball Bearings</em> \nStart Date is - 05/31/2018 \nhgdhdrgdffgggg"
                    ]
                }

For some speed issues, i can not able to use Fvh highlighter. I am restricted to use unified highlighter
According to Elasticsearch reference module "fragmentor":"span" should solve this issue. But i am getting the same issue as when not using "fragmentor" : "span".

"highlight": {
        "type" : "unified",
        "fragmenter": "span",
        "fields": {
            "*": {}
        }
    }

The results are:

"highlight": {
 "FILE_CONTENT": [
                        "follows: \n1)Test Clause\n1.1) test 29_May\nDummy values - \nCurrency is - INR  \nContract Category is - <em>Ball</em>",
                        "<em>Bearings</em> \nStart Date is - 05/31/2018 \nhgdhdrgdffggggggggggggggggggggggggggggggggggggggggggggggjerklwagggggggggggggggggggggggggggggggggg"
                    ] 
}

(Nikesh) #2

@elastic Please provide me any solution to this if it exists.


(Nikesh) #3

@elastic Could you please respond to this Query? I have waited longer than expected time for a response.


(David Pilato) #4

May be @jimczi has an idea?


(Jimferenczi) #5

The unfied does not handle different fragmenter. However you can set the fragment size to -1 in order to not break sentences or set the fragment size to a value greater than the default (150 chars). We re also working on adding different fragmenter for the unified highlighter but this is not ready yet.


(Jimferenczi) #6

Sorry if you don t want to break sentences you need to set the fragment size to 0 and not -1 as I said un m'y previous comment.


(system) closed #7

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.