I'm really struggling to get proper shingle searching to work. I've tried
dozens of variations, using text, string_query, bools, dis_max. The whole
works. I simply cannot get it to function the way that I want. I imagine
I'm doing something fundamentally wrong, since this seems like an easy
behavior. My mapping looks like this: https://gist.github.com/4063964
Basically, I'm indexing a field with a normal tokenizer as well as a
shingle tokenizer. With regards to search, I want to match exact phrases
first, then match shingled phrases next (e.g. partial phrases). I'm
searching for "Great Planes Rotor Blade" using the following query:
{
"explain":true,
"size":5,
"from":0,
"highlight":{
"pre_tags":[
""
],
"post_tags":[
""
],
"fields":{
"body":{
}
}
},
"query":{
"dis_max":{
"tie_breaker":0.7,
"queries":[
{
"text":{
"body":{
"query":"Great Planes Rotor Blade",
"type":"phrase"
}
}
},
{
"query_string":{
"fields":[
"body"
],
"query":"Great Planes Rotor Blade",
"phrase_slop":0,
"minimum_should_match":"40%"
}
},
{
"query_string":{
"fields":[
"body"
],
"query":"Great Planes Rotor Blade",
"phrase_slop":0,
"minimum_should_match":"40%",
"analyzer":"analyzer_partial_shingle"
}
}
]
}
}
}
Unfortunately, I'm getting results all over the place. Some items which
use the word "blade" 4-5 times will rank higher than items that use the
phrase "Great Planes" once. I assumed that shingling the query (using
analyzer_partial_shingle) and then searching the indexed shingles would
find "Great Planes" and increase the score, but it doesn't seem to be
working that way.
Anyone shed some light on what I'm doing wrong?
--