Highlighter trim a field, why?

ebuildy · December 9, 2015, 6:00pm

I use default mapping:

PUT /tom/test/1
{
   "title" : "tom",
   "url" : "http://www.fetedelascience.fr/",
   "url2" : "http://www.fetedelascience.fr/3"
}

POST /tom/_search
{
"query": {
"match": {
"title": "tom"
}
},
"highlight": {
"no_match_size":155,
"fields": {
"url" : {},
"url2" : {}
}
}
}

Gives me:

{
    "_index": "tom",
    "_type": "test",
    "_id": "1",
    "_score": 0.30685282,
    "_source": {
       "title": "tom",
       "url": "http://www.fetedelascience.fr/",
       "url2": "http://www.fetedelascience.fr/3"
    },
    "highlight": {
       "url2": [
          "http://www.fetedelascience.fr/3"
       ],
       "url": [
          "http://www.fetedelascience.fr"
       ]
    }
 }

Why the last slash of URL has disappeared?

nik9000 · December 9, 2015, 7:38pm

Weird. Its just how the no_match segmenter works in the plain highlighter. It just grabs text ending at the last token before the end of the text. I wrote this many years ago to simulate how the plain highlighter does segmentation when it finds hits but it looks like its wrong. This is a bug but I don't think it'll be too high on my priority list, sadly:

ebuildy · December 9, 2015, 7:53pm

No problem, happy to find the reason at least!

This is something wrote in ES not in Lucene, here https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/search/highlight ?

Do you think if I use fast_ or posting_ hightligther this could fix it?

Thanks you,

nik9000 · December 9, 2015, 8:25pm

Yeah, I'm aware. To varying degrees they are able to delegate down to the Lucene bits.

They all implement the process differently. You should try on the fvh, its more likely to work. The postings highlighter isn't going to do what you want unless you feed it complete sentences.

ebuildy · December 9, 2015, 8:32pm

Just curious, why not rely on Lucene highlighter?

I will test fvh tomorrow, after re-indexing my data.

thanks you

nik9000 · December 9, 2015, 9:11pm

Lucene doesn't have support for no_match_size. Most of the code elasticsearch has for highlighting is really just to adapt the API into Lucene's highlighters. no_match_size is kind of an anomaly in that its trying to implement something without upstreaming it. And I'm not 100% sure why I didn't upstream the change at the time.

Topic		Replies	Views
Trim the Content after applying highlight funciton Elasticsearch	1	305	February 6, 2019
The highlight is not returned when using prefixing query Elasticsearch	1	172	January 1, 2024
Can't Get Highlighting Working Elasticsearch	17	979	July 6, 2017
No highlighting of exact mach terms in analyzed field Elasticsearch	3	1138	July 6, 2017
How highlighting work with intervals query? Elasticsearch	1	613	January 18, 2020

Highlighter trim a field, why?

Related topics