Stemming not working as expected


(Ray) #1

Using the analyze API, I get "fold" and "knife" because of the synonym, using the "default"; analyzer. That's working as expected, however, we're not getting the correct results returned when using the search API.

get /inventory/_analyze
{
"analyzer": "default",
"text": "folding knives"
}

The above returns:

{
"tokens": [
{
"token": "fold",
"start_offset": 0,
"end_offset": 7,
"type": "word",
"position": 0
},
{
"token": "knife",
"start_offset": 8,
"end_offset": 14,
"type": "SYNONYM",
"position": 1
}
]
}

The search API, using the "default" analyzer:

get /inventory/products/_search
{
"explain": true,
"sort" : [
{ "_score" : {"order" : "desc"}},
"id.keyword"
],
"query": {
"multi_match": {
"query": "folding knives",
"type": "most_fields",
"fuzziness": 1,
"prefix_length": 3,
"operator": "AND",
"minimum_should_match": "2<75%",
"analyzer": "default",
"fields": [
"manufacturer^3", "manufacturer.raw^3",
"everything", "everything.raw"
]
}
}
}

The above returns:

{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits":
}
}

But it should be providing many results, some that should be returned from this search

      "everything": "ACC041C, 015896000416, ACCUSHARP CLASSIC COMBO PACK BLU, Knives, Knives & Tools, AccuSharp, Model 041C, Folding Knife, Blue Aluminum Grip, Stainless Steel Blade, Includes SharpNEasy Tool Sharpener, Plain, Bl, Blue, Aluminum, ACCU, AccuSharp, 041C, AccuSharp, Card, Folding Knives, Folding Knife",
      "manufacturer": "ACCU, AccuSharp"
      "everything": """BH15PM01BK, 648018100284, BH POINT MAN PLN BLK, Knives, Knives & Tools, BLACKHAWK!, Point Man, Folding Knife, 3.4" Black PVD Coated AUS8A Stainless Steel Blade, Plain Edge, G10 Scales, Pocket Clip, Plain, Blk, Black, Stainless Steel Liners w/G10 Scales, BH, BLACKHAWK!, 15PM01BK, Point Man, 3.4", Fixed Blade Knives, Folding Knife""",
      "manufacturer": "BH, BLACKHAWK!"

Are we missing something here, why is it not returning the results that do match? I can provide the settings and mapping if needed

On second thought, below is the mapping and settings

{
    "mappings": {
        "products": {
            "properties": {
                "everything": {
                    "fields": {
                        "raw": {
                            "type": "keyword"
                        }
                    },
                    "type": "text"
                },
                "id": {
                    "fields": {
                        "keyword": {
                            "ignore_above": 256,
                            "type": "keyword"
                        }
                    },
                    "type": "text"
                },
                "manufacturer": {
                    "fields": {
                        "raw": {
                            "type": "keyword"
                        }
                    },
                    "type": "text"
                },
                "upsert": {
                    "properties": {
                        "counter": {
                            "type": "long"
                        }
                    }
                }
            }
        }
    },
    "settings": {
        "analysis": {
            "analyzer": {
                "default": {
                    "filter": ["lowercase", "custom_synonym", "custom_no_stem", "custom_stemmer", "custom_word_delimiter", "custom_stop"],
                    "tokenizer": "whitespace",
                    "type": "custom"
                }
            },
            "filter": {
                "custom_no_stem": {
                    "keywords": ["accessories", "agency", "ammunition", "arsenal", "axe", "browning", "bushmaster", "charging", "collapsible", "equipment", "ets", "finder", "fired", "gas", "handle", "mbus", "optical", "precision", "prs", "range", "rangefinder", "reloading", "revolver", "rifles", "sharpener", "silencer", "sporting", "suppressor", "tactical"],
                    "type": "keyword_marker"
                },
                "custom_stemmer": {
                    "name": "english",
                    "type": "stemmer"
                },
                "custom_stop": {
                    "stopwords": "_english_",
                    "type": "stop"
                },
                "custom_synonym": {
                    "synonyms_path": "analysis/synonym.txt",
                    "type": "synonym"
                },
                "custom_word_delimiter": {
                    "preserve_original": "true",
                    "protected_words": [<excluded due to body limit>],
                    "split_on_numerics": "false",
                    "type": "word_delimiter"
                }
            }
        }
    }
}