We are currently getting zero results when searching with term along with quantity for example coke 1l. So, I want to index a new field packquantity which has values like 1 kg, 500 g, 2l. But my concern here is how to handle kilogram or liter or spaces like 1kg or 1 KG or how to handle 0.5 kg?
"word_delimiter_graph": {
"type": "word_delimiter_graph",
"preserve_original": true,
"catenate_all": true
}
"msg_packagingunit_split": {
"type": "pattern_replace",
"pattern": """(\d+(?:.\d+)?)([a-zA-Z]+)""",
"replacement": "$1 $2"
}
"packagingunit": {
"char_filter": ["msg_mapping_char_filter", "msg_packagingunit_split"],
"filter": [
"lowercase",
"word_delimiter_graph"
],
"type": "custom",
"tokenizer": "standard"
},
"packagingunit_search": {
"char_filter": ["msg_mapping_char_filter", "msg_packagingunit_split"],
"filter": [
"lowercase",
"msg_synonym",
"word_delimiter_graph"
],
"type": "custom",
"tokenizer": "standard"
},
"PackagingUnit": {
"type": "keyword",
"fields": {
"text": {
"type": "text",
"analyzer": "packagingunit",
"search_analyzer":"packagingunit_search"
}
}}I am using these settings and mappings for the field but still not getting results for rice 1kg
but i am getting results for rice 1 kg
query:
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "rice 1kg",
"fields": [
"AdditionalEAN.text^1.0",
"CategoryNameLevel0_DFS.text^0.01",
"CategoryNameLevel1_DFS.text^0.7",
"CategoryNameLevel2_DFS.text^0.65",
"CategoryNameLevel3_DFS.text^0.4",
"CategoryNameLevel4_DFS.text^0.3",
"CategoryNameLevel5_DFS.text^0.3",
"EANCode.text^1.0",
"MSG.AT.Badges.text^0.05",
"ManufacturerName.text^0.65",
"SKU.text^1.0",
"SearchIndexKeywords.text^0.2",
"longDescription.text^0.01",
"name.keyword^2.0",
"name.text^1.0",
"shortDescription.text^0.02",
"PackagingUnit.keyword^2.0",
"PackagingUnit.text^1.0"
],
"type": "most_fields",
"operator": "OR",
"slop": 0,
"prefix_length": 0,
"max_expansions": 50,
"zero_terms_query": "NONE",
"auto_generate_synonyms_phrase_query": false,
"fuzzy_transpositions": true,
"boost": 1
}
}
],
"filter": [
{
"multi_match": {
"query": "rice 1kg",
"fields": [
"AdditionalEAN.text^1.0",
"CategoryNameLevel0_DFS.text^1.0",
"CategoryNameLevel1_DFS.text^1.0",
"CategoryNameLevel2_DFS.text^1.0",
"CategoryNameLevel3_DFS.text^1.0",
"CategoryNameLevel4_DFS.text^1.0",
"CategoryNameLevel5_DFS.text^1.0",
"EANCode.text^1.0",
"MSG.AT.Badges.text^1.0",
"ManufacturerName.text^1.0",
"SKU.text^1.0",
"SearchIndexKeywords.text^1.0",
"longDescription.text^1.0",
"name.keyword^1.0",
"name.text^1.0",
"shortDescription.text^1.0",
"PackagingUnit.text^1.0",
"PackagingUnit.keyword^1.0"
],
"type": "cross_fields",
"operator": "OR",
"analyzer": "autocomplete_search",
"slop": 0,
"prefix_length": 0,
"max_expansions": 50,
"minimum_should_match": "100%",
"zero_terms_query": "NONE",
"auto_generate_synonyms_phrase_query": false,
"fuzzy_transpositions": true,
"boost": 1
}
},
{
"bool": {
"should": [
{
"bool": {
"filter": [
{
"terms": {
"brand": [
"ALL",
"DRI",
"MED"
],
"boost": 1
}
}
],
"adjust_pure_negative": true,
"boost": 1
}
}
],
"adjust_pure_negative": true,
"minimum_should_match": "1",
"boost": 1
}
}
],
"adjust_pure_negative": true,
"boost": 1
}
},
Please provide some more formatting for your code samples (using triple ticks) and also provide fully reproducible examples including sample documents and full index mapping - that makes it easier to reproduce your exact case.
In general this is not a simple task from a search perspective. First as already noted, your users are always searching different, than your data is stored (1kg vs 1 kg is the most simple example). So the main question is, how you can unify that behavior. As usual, the classic answer would be: index time vs. query time. Either you find a way to store 1kg and 1 kg at index time, so that any search would hit, or you unify 1kg to 1 kg (or vice versa) at index & query time to the same tokens.
The problem also does not stop here, as you may need to follow up with normalizing numbers, see here.
If you have good product data, this might be available in dedicated attributes already. If you have a good query parser in your application you might be able to split this from the main query and filter on those attributes. If you are just interesting in some matching, then maybe a regular expression that always splits $number$letters (i.e. 1kg) with a space inbetween, that might be good enough already.
Also, if you do not know, why something does not match, make sure to use the analyze and the explain APIs.
Hope this helps as a start.