Fuzzy searches not matching as expected


(designermonkey) #1

I have a search set up to do fuzzy matching on the tokens that the standard
analyzer returns.

It all works well on single words, and we get accurate results, say, on a
title field like 'Cherry Cheesecake'. If we search for either 'Cherry', or
'Cheesecake' we get a result.

If we search for 'Cherry Cheesecake' we get 0 results.

I'm really at a loss as to how this can be happening, as I would have
thought it matches closer than either separate word.

The index is as follows:

{
"analysis": {
"analyzer": {
"custom_fulltext" : {
"type": "custom",
"tokenizer" : "standard",
"filter": ["stop", "asciifolding", "snowball", "lowercase",
"custom_synonyms", "custom_stop"]
}
},
"filter" : {
"custom_synonyms": {
"type": "synonym",
"ignore_case": "true",
"synonyms": [
"i-pod, i pod => ipod",
"definately, definitly, definetly => definitely"
]
},
"custom_stop": {
"type": "stop",
"stopwords": ["a", "an", "and", "are", "as", "at", "be", "but", "by",
"into", "is", "it", "of", "on", "or", "such", "that", "the", "their",
"there", "these", "they", "this", "to", "was", "will"]
}
}
}
}

The mapping is as such:

{
"recipes": {
"properties": {
"title": {
"type": "string",
"index": "analyzed",
"analyzer": "nigella_fulltext"
},
"url": {
"type": "string",
"index": "no",
"include_in_all": false
},
"introduction": {
"type": "string",
"index": "no",
"include_in_all": false
},
"ingredients": {
"type": "string",
"index": "analyzed"
},
"moods": {
"type": "string",
"index": "not_analyzed",
"include_in_all": false
},
"occasions": {
"type": "string",
"index": "not_analyzed",
"include_in_all": false
},
"is_vegetarian": {
"type": "string",
"store": "yes",
"index": "not_analyzed",
"null_value": "no"
},
"is_gluten_free": {
"type": "string",
"store": "yes",
"index": "not_analyzed",
"null_value": "no"
},
"is_express": {
"type": "string",
"store": "yes",
"index": "not_analyzed",
"null_value": "no"
},
"is_baking": {
"type": "string",
"store": "yes",
"index": "not_analyzed",
"null_value": "no"
}
"is_premium": {
"type": "string",
"store": "yes",
"index": "not_analyzed",
"null_value": "no"
},
"ordering_field": {
"type": "integer",
"store": "yes",
"index": "not_analyzed",
"null_value": "0"
}
}
}
}

And the search is as follows:

{
"explain": true,
"sort": [{
"ordering_field": "desc"
}, "_score"],
"size": 15,
"from": 0,
"query": {
"bool": {
"should": [{
"bool": {
"should": [{
"fuzzy": {
"recipes.title": {
"value": "cherry cheesecake",
"min_similarity": 0.75,
"boost": 5
}
}
}, {
"fuzzy": {
"recipes.ingredients": {
"value": "cherry cheesecake",
"min_similarity": 0.75
}
}
}, ]
}
}]
}
},
"facets": {
"filtered_sections": {
"terms": {
"field": "_type"
}
},
"is_express": {
"terms": {
"field": "recipes.is_express"
}
},
"is_vegetarian": {
"terms": {
"field": "recipes.is_vegetarian"
}
},
"mood": {
"terms": {
"field": "recipes.moods"
}
},
"occasion": {
"terms": {
"field": "recipes.occasions"
}
}
},
"filter": [{
"terms": {
"_type": ["recipes"]
}
}]
}

I've been tearing my hair out to this point to get search returning any
kind of accurate results, anf this is the first time I have good results,
but then find out that it doesn't even work for a perfect match.

Any help anyone has would be greatly appreciated.

John.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Clinton Gormley) #2

Hi John

The fuzzy query is a term-based query, not a full text query, so it is
looking for a single term that is fuzzily like "cherry cheesecake".

Change to using the match or multi_match query instead:

{
"multi_match": {
"fields": [
"recipes.title",
"recipes.ingredients"
],
"query": "cherry cheesecake",
"fuzziness": 2
}
}

clint

On 18 October 2013 18:03, John Porter john@designermonkey.co.uk wrote:

I have a search set up to do fuzzy matching on the tokens that the
standard analyzer returns.

It all works well on single words, and we get accurate results, say, on a
title field like 'Cherry Cheesecake'. If we search for either 'Cherry', or
'Cheesecake' we get a result.

If we search for 'Cherry Cheesecake' we get 0 results.

I'm really at a loss as to how this can be happening, as I would have
thought it matches closer than either separate word.

The index is as follows:

{
"analysis": {
"analyzer": {
"custom_fulltext" : {
"type": "custom",
"tokenizer" : "standard",
"filter": ["stop", "asciifolding", "snowball", "lowercase",
"custom_synonyms", "custom_stop"]
}
},
"filter" : {
"custom_synonyms": {
"type": "synonym",
"ignore_case": "true",
"synonyms": [
"i-pod, i pod => ipod",
"definately, definitly, definetly => definitely"
]
},
"custom_stop": {
"type": "stop",
"stopwords": ["a", "an", "and", "are", "as", "at", "be", "but", "by",
"into", "is", "it", "of", "on", "or", "such", "that", "the", "their",
"there", "these", "they", "this", "to", "was", "will"]
}
}
}
}

The mapping is as such:

{
"recipes": {
"properties": {
"title": {
"type": "string",
"index": "analyzed",
"analyzer": "nigella_fulltext"
},
"url": {
"type": "string",
"index": "no",
"include_in_all": false
},
"introduction": {
"type": "string",
"index": "no",
"include_in_all": false
},
"ingredients": {
"type": "string",
"index": "analyzed"
},
"moods": {
"type": "string",
"index": "not_analyzed",
"include_in_all": false
},
"occasions": {
"type": "string",
"index": "not_analyzed",
"include_in_all": false
},
"is_vegetarian": {
"type": "string",
"store": "yes",
"index": "not_analyzed",
"null_value": "no"
},
"is_gluten_free": {
"type": "string",
"store": "yes",
"index": "not_analyzed",
"null_value": "no"
},
"is_express": {
"type": "string",
"store": "yes",
"index": "not_analyzed",
"null_value": "no"
},
"is_baking": {
"type": "string",
"store": "yes",
"index": "not_analyzed",
"null_value": "no"
}
"is_premium": {
"type": "string",
"store": "yes",
"index": "not_analyzed",
"null_value": "no"
},
"ordering_field": {
"type": "integer",
"store": "yes",
"index": "not_analyzed",
"null_value": "0"
}
}
}
}

And the search is as follows:

{
"explain": true,
"sort": [{
"ordering_field": "desc"
}, "_score"],
"size": 15,
"from": 0,
"query": {
"bool": {
"should": [{
"bool": {
"should": [{
"fuzzy": {
"recipes.title": {
"value": "cherry cheesecake",
"min_similarity": 0.75,
"boost": 5
}
}
}, {
"fuzzy": {
"recipes.ingredients": {
"value": "cherry cheesecake",
"min_similarity": 0.75
}
}
}, ]
}
}]
}
},
"facets": {
"filtered_sections": {
"terms": {
"field": "_type"
}
},
"is_express": {
"terms": {
"field": "recipes.is_express"
}
},
"is_vegetarian": {
"terms": {
"field": "recipes.is_vegetarian"
}
},
"mood": {
"terms": {
"field": "recipes.moods"
}
},
"occasion": {
"terms": {
"field": "recipes.occasions"
}
}
},
"filter": [{
"terms": {
"_type": ["recipes"]
}
}]
}

I've been tearing my hair out to this point to get search returning any
kind of accurate results, anf this is the first time I have good results,
but then find out that it doesn't even work for a perfect match.

Any help anyone has would be greatly appreciated.

John.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(designermonkey) #3

Hi Clint,

Thanks for responding, I think I see what you mean.

I also think I've got more accurate results using a 'text' query with
'fuzziness' for the search, and I'm wondering if I need to add the fuzzy
queries as well to match the single words?

Also, can you boost a specific field in a text query at search time?

On Friday, 18 October 2013 18:05:54 UTC+1, Clinton Gormley wrote:

Hi John

The fuzzy query is a term-based query, not a full text query, so it is
looking for a single term that is fuzzily like "cherry cheesecake".

Change to using the match or multi_match query instead:

{
"multi_match": {
"fields": [
"recipes.title",
"recipes.ingredients"
],
"query": "cherry cheesecake",
"fuzziness": 2
}
}

clint

On 18 October 2013 18:03, John Porter <jo...@designermonkey.co.uk<javascript:>

wrote:

I have a search set up to do fuzzy matching on the tokens that the
standard analyzer returns.

It all works well on single words, and we get accurate results, say, on a
title field like 'Cherry Cheesecake'. If we search for either 'Cherry', or
'Cheesecake' we get a result.

If we search for 'Cherry Cheesecake' we get 0 results.

I'm really at a loss as to how this can be happening, as I would have
thought it matches closer than either separate word.

The index is as follows:

{
"analysis": {
"analyzer": {
"custom_fulltext" : {
"type": "custom",
"tokenizer" : "standard",
"filter": ["stop", "asciifolding", "snowball", "lowercase",
"custom_synonyms", "custom_stop"]
}
},
"filter" : {
"custom_synonyms": {
"type": "synonym",
"ignore_case": "true",
"synonyms": [
"i-pod, i pod => ipod",
"definately, definitly, definetly => definitely"
]
},
"custom_stop": {
"type": "stop",
"stopwords": ["a", "an", "and", "are", "as", "at", "be", "but", "by",
"into", "is", "it", "of", "on", "or", "such", "that", "the", "their",
"there", "these", "they", "this", "to", "was", "will"]
}
}
}
}

The mapping is as such:

{
"recipes": {
"properties": {
"title": {
"type": "string",
"index": "analyzed",
"analyzer": "nigella_fulltext"
},
"url": {
"type": "string",
"index": "no",
"include_in_all": false
},
"introduction": {
"type": "string",
"index": "no",
"include_in_all": false
},
"ingredients": {
"type": "string",
"index": "analyzed"
},
"moods": {
"type": "string",
"index": "not_analyzed",
"include_in_all": false
},
"occasions": {
"type": "string",
"index": "not_analyzed",
"include_in_all": false
},
"is_vegetarian": {
"type": "string",
"store": "yes",
"index": "not_analyzed",
"null_value": "no"
},
"is_gluten_free": {
"type": "string",
"store": "yes",
"index": "not_analyzed",
"null_value": "no"
},
"is_express": {
"type": "string",
"store": "yes",
"index": "not_analyzed",
"null_value": "no"
},
"is_baking": {
"type": "string",
"store": "yes",
"index": "not_analyzed",
"null_value": "no"
}
"is_premium": {
"type": "string",
"store": "yes",
"index": "not_analyzed",
"null_value": "no"
},
"ordering_field": {
"type": "integer",
"store": "yes",
"index": "not_analyzed",
"null_value": "0"
}
}
}
}

And the search is as follows:

{
"explain": true,
"sort": [{
"ordering_field": "desc"
}, "_score"],
"size": 15,
"from": 0,
"query": {
"bool": {
"should": [{
"bool": {
"should": [{
"fuzzy": {
"recipes.title": {
"value": "cherry cheesecake",
"min_similarity": 0.75,
"boost": 5
}
}
}, {
"fuzzy": {
"recipes.ingredients": {
"value": "cherry cheesecake",
"min_similarity": 0.75
}
}
}, ]
}
}]
}
},
"facets": {
"filtered_sections": {
"terms": {
"field": "_type"
}
},
"is_express": {
"terms": {
"field": "recipes.is_express"
}
},
"is_vegetarian": {
"terms": {
"field": "recipes.is_vegetarian"
}
},
"mood": {
"terms": {
"field": "recipes.moods"
}
},
"occasion": {
"terms": {
"field": "recipes.occasions"
}
}
},
"filter": [{
"terms": {
"_type": ["recipes"]
}
}]
}

I've been tearing my hair out to this point to get search returning any
kind of accurate results, anf this is the first time I have good results,
but then find out that it doesn't even work for a perfect match.

Any help anyone has would be greatly appreciated.

John.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #4