I used hightlight with parameter 'boundary_chars', but not always get fragment truncated at boundary characters.
Example:
PUT wpz_2
PUT wpz_2/_mapping/test
{
"properties": {
"test": {
"analyzer": "index_ansj",
"type": "string",
"term_vector": "with_positions_offsets"
}
}
}
PUT wpz_2/test/3
{
"test": ",全市工业80%以上的大型装备实现了信息化集成。投资2000万元启动“智慧企业”专项行动,重点支持工业企业无线、物联技术应用。"
}
GET wpz_2/test/_search
{
"query": {
"match": {
"test": "智慧"
}
}
, "highlight": {
"boundary_chars": ".,!?;,。?!",
"fragment_size": 30,
"fields": {
"test": {}
}
}
}
Output:
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.057534903,
"hits": [
{
"_index": "wpz_2",
"_type": "test",
"_id": "3",
"_score": 0.057534903,
"_source": {
"test": ",全市工业80%以上的大型装备实现了信息化集成。投资2000万元启动“智慧企业”专项行动,重点支持工业企业无线、物联技术应用。"
},
"highlight": {
"test": [
"集成。投资2000万元启动“<em>智慧</em>企业”专项行动,重点支持工业企业无线、物联技术应用"
]
}
}
]
}
}
The highlight fragment is not truncated at '。' (third character)
I also tried standard analyzer, result is the same. Can I always get fragment truncated at boundary characters? Here, what I expect is:
"投资2000万元启动“<em>智慧</em>企业”专项行动,重点支持工业企业无线、物联技术应用"
I'm using ES 2.3.3
Best regards,
Abel