I am trying to upgrade from 0.20.x to 0.90.0 and I am facing the following
problem.
I have an index where I keep some blog posts. And when I am search for a
blogpost I use elasticsearch highlighting. This was working perfectly with
versions prior to 0.90.0.
In the latest version, elasticsearch does not return the highlight element
when along with the hits. When I used the fast-vector-highlighter
everything works fine.
I managed to narrow down the causes of this problem and I found out that
the highlighting with the plain highlighter is not working when i use a
pattern tokenizer. These are the declarations of the analyzer:
my_analyzer:
type: custom
tokenizer: more-whitespace
more-whitespace:
type: pattern
pattern: ([ \n\t.,!\/"']+)
Here is how I reproduce the problem:
curl -XPUT 'http://localhost:9200/test/?pretty' -d '{
"index" : true,
"settings": {
"number_of_shards" : 1
},
"default": {
"include_in_all" : true
},
"mappings" : {
"blog_post" : {
"type" : "object",
"_all" : {"enabled" : true, analyzer: "my_analyzer"},
"_source": {"enabled" : true, "compress" : true},
"properties": {
"title": {
"type" : "string",
"index" : "analyzed",
"analyzer" : "my_analyzer",
"include_in_all" : true
},
"body": {
"type":"string",
"index":"analyzed",
"analyzer" : "my_analyzer",
"include_in_all" : true
}
}
}
}
}'
{
"ok" : true,
"acknowledged" : true
}
curl -XPUT 'http://localhost:9200/test/blog_post/1' -d '{
"title" : "Some interesting title",
"body" : "Some interesting body"
}'
{"ok":true,"_index":"test","_type":"blog_post","_id":"1","_version":1}
curl -XGET
'http://localhost:9200/test/blog_post/_search?routing=arkon&pretty' -d '{
"query": {
"match" : {
"_all" : {
"query" : "interesting"
}}},
"highlight" : {
"pre_tags" : ["<span class="highlight">"],
"post_tags" : [""],
"fields" : {"body" : {}, "title": {}}}
}
'
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 0.16273327,
"hits" : [ {
"_index" : "test",
"_type" : "blog_post",
"_id" : "1",
"_score" : 0.16273327, "_source" : {
"title" : "Some interesting title",
"body" : "Some interesting body"
}
} ]
}
If I use the fast-vector-highlighter:
curl -XDELETE 'http://localhost:9200/test/'
{"ok":true,"acknowledged":true }
curl -XPUT 'http://localhost:9200/test/' -d '{
"index" : true,
"settings": {
"number_of_shards" : 1
},
"default": {
"include_in_all" : true
},
"mappings" : {
"blog_post" : {
"type" : "object",
"_all" : {"enabled" : true, analyzer: "my_analyzer"},
"_source": {"enabled" : true, "compress" : true},
"properties": {
"title": {
"type" : "string",
"index" : "analyzed",
"analyzer" : "my_analyzer",
"term_vector" : "with_positions_offsets",
"include_in_all" : true
},
"body": {
"type":"string",
"index":"analyzed",
"analyzer" : "my_analyzer",
"term_vector" : "with_positions_offsets",
"include_in_all" : true
}
}
}
}
}'
{"ok":true,"acknowledged":true}
curl -XPUT 'http://localhost:9200/test/blog_post/1' -d '{
"title" : "Some interesting title",
"body" : "Some interesting body"
}'
{"ok":true,"_index":"test","_type":"blog_post","_id":"1","_version":1}
curl -XGET
'http://localhost:9200/test/blog_post/_search?routing=arkon&pretty' -d '{
"query": {
"match" : {
"_all" : {
"query" : "interesting"
}}},
"highlight" : {
"pre_tags" : ["<span class="highlight">"],
"post_tags" : [""],
"fields" : {"body" : {}, "title": {}}}
}
'
{
"took" : 14,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 0.16273327,
"hits" : [ {
"_index" : "test",
"_type" : "blog_post",
"_id" : "1",
"_score" : 0.16273327, "_source" : {
"title" : "Some interesting title",
"body" : "Some interesting body"
},
"highlight" : {
"body" : [ "Some <span class="highlight">interesting body"
],
"title" : [ "Some <span class="highlight">interesting
title" ]
}
} ]
}
The same output I have with the plain hightlighter and whitespace tokenizer.
For now, I use the fast-vector-highlighter which solves the problem because
I can't change my tokenizer. But I wanted to ask If is something wrong with
my configuration or my index mapping or an unidentified bug.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.