Increasing slop causes highlighting to become inconsistent?


(Campbell) #1

Hi

Seems like I might be running into an issue. If I use the slop parameter on a query with highlighting, it causes about 50% of the hits to not have the highlight field. However, the "blank" hits are correct. If I go back to the original document, the query exists in the text. So elasticsearch is correctly finding the hits, it's just that about half of the hits fail to have any highlight field returned. However, if I set slop to 0, all hits have the highlight field. This is on elasticsearch version 1.4.4.

For the field that I'm highlighting in, the mapping is:

"DocContent": { 
	"type":"string", 
	"store":"yes", 
	"index":"analyzed",
	"omit_norms":"true",
	"analyzer":"doc_analyzer",
	"term_vector":"with_positions_offsets",
	"include_in_all":"false"
}				

The query I'm using is:

{
	"query":{
		"filtered":{
			"query":{
				"match_phrase":{
					"DocContent":{
						"query":"the information",
						"slop":5
					}
				}
			},
			"filter":{
				"bool":{
					"must":[{"term":{"Date":"20150410"}}]
				}
			}
		}
	},
	"highlight":{
		"number_of_fragments":2,
		"fragment_size":1000,
		"require_field_match":"true",
		"pre_tags":["<MATCH>"],
		"post_tags":["</MATCH>"],
		"fields":{"DocContent":{}}
	},
	"from":0,
	"size":99999
}

This will return data that looks like:

...		   
   {
		"_id": "4bab2a03c0222db5c774430267e51d96", 
		"_index": "main", 
		"_score": 2.2285054, 
		"_type": "document"
   }, 
   {
		"_id": "550f4aabe63de2464d455cfd16451993", 
		"_index": "main", 
		"_score": 2.1643262, 
		"_type": "document", 
		"highlight": {
			 "DocContent": [
				  "... <MATCH>the</MATCH> ... <MATCH>information</MATCH>..."
			 ]
		}
   }, 
   {
		"_id": "266312c459422b2a8bd3561a8031ff35", 
		"_index": "main", 
		"_score": 2.1643262, 
		"_type": "document"
   },
....

However, if I take out the slop parameter completely or set it to 0, I never get "blank" or missing highlight fields. I'm not really finding any pattern to which ones don't get the highlight field and which do. Any idea what's going on? Above I use "the information" but this happens over a wide range of query strings.


(Campbell) #2

In case anyone comes across this, looks like you have to use the 'plain' highlighter when using slop. Switching to that from fvh worked for me in instances of using slop.


(system) #3