Relevance with short combinations of letters

Snail_Banshee · April 12, 2018, 12:25pm

Hello, everyone!
I`m trying to make relevant search for cars (inc. brands models and part names),
for example search request is: "air filter audi a3".

My settings (sorry for php array):
'settings' => [
'analysis' => [
'filter'=>[
'ru_stop'=>[
"type"=>"stop",
"stopwords"=>"russian"
],
'ru_stemmer'=>[
"type"=>"stemmer",
"language"=>"russian"
],
'english_stop'=>[
"type"=>"stop",
"stopwords"=>"english"
],
'english_stemmer'=>[
"type"=>"stemmer",
"language"=>"english"
],
'synonym'=>[
"type"=>"synonym",
"synonyms_path"=>$this->iCustom->iDRoot."/files/es_synonyms.txt"
],
'snowball'=>[
"type"=>"snowball",
"language"=>"Russian"
],
"word_delimiter"=>[
"type"=>"word_delimiter",
"split_on_numerics"=>false,
"split_on_case_change"=>false,
"generate_word_parts"=>true,
"generate_number_parts"=>false,
"catenate_words"=>true,
'preserve_original'=>true
],
'ascii_folding'=>[
"type"=>"asciifolding",
"preserve_original"=>true
],
'gram_filter'=>[
'type'=>'nGram',
'min_gram'=>3,
'max_gram'=>15,
'token_chars'=>[
'letter',
'digit'
]
],
'unique_stem'=>[
'type'=>'unique',
'only_on_same_position'=>true
],
'length_filter'=>[
'type'=>'length',
'min'=>2,
'max'=>20
],
],
'tokenizer'=>[
'gram_tokenizer'=>[
'type'=>'nGram',
'min_gram'=>3,
'max_gram'=>15,
'token_chars'=>[
'letter',
'digit'
]
]
],
'analyzer'=>[
'default'=>[
'type'=>'custom',
"char_filter"=>[
"html_strip"
],
"tokenizer"=>'standard',
"filter"=>[
'trim',
"lowercase",
"word_delimiter",
"snowball",
"ascii_folding",
//"keyword_repeat",
//"unique_stem",
"ru_stop",
"ru_stemmer",
"english_stop",
"english_stemmer",
"synonym",
'length_filter',
//"gram_filter",
]
]
]
]
],
'mappings'=>[
'item'=>[
'properties'=>[
"title"=>[
"type"=>"string",
"index"=>"not_analyzed",
'index_options'=>'docs',
],
"data"=>[
"type"=>"string",
'analyzer'=>'default',
],
"description"=>[
"type"=>"string",
"analyzer"=>"default",
"store"=>true
],
"numbers"=>[
"type"=>"string",
"store"=>true,
"index"=>"not_analyzed"
],
"timestamp"=>[
"type"=>"integer",
"index"=>"not_analyzed"
],
]
]
]

Some "data" in ES:

Air filter MAPCO 65217; AUDI A4 (8E2, B6) 1.8 T 2002 2001 2000; AUDI A4 Avant (8E5, B6) 1.8 T 2002 2001;
Air filter: Air filter MAGNETI MARELLI 153071760244; AUDI A4 (8K2, B8) 1.8 TFSI 2012 2011 2010 2009 2008 2007; AUDI A4 Avant (8K5, B8) 1.8 TFSI 2012 2011 2010 2009 2008 2007;
Air filter FEBI BILSTEIN 48477; AUDI Q7 (4LB) 3.0 TDI quattro 2008 2007 2006; VW TOUAREG (7LA, 7L6, 7L7) 2.5 R5 TDI 2010 2009 2008 2007 2006 2005 2004 2003;
Air filter: Air filter FRAM CA10236; AUDI Q7 (4LB) 3.0 TDI quattro 2010 2009 2008 2007 2006; AUDI A3 (8V1, 8VK) S3 quattro 2018 2017 2016 2015 2014 2013 2012; PORSCHE CAYENNE (9PA) 3.0 TDI 2010 2009;
Inside air filter FRAM CFA8869; AUDI TT Roadster (8N9) 1.8 T 2006 2005 2004 2003 2002 2001 2000 1999; AUDI A3 (8L1) 1.6 2003 2002 2001 2000 1999 1998 1997 1996; AUDI TT (8N3) 1.8 T 2006 2005 2004 2003 2002 2001 2000 1999 1998;

etc...

So, i have response with score for data that contains Audi A3 lower than forexample Audi A4 or Audi Q7, etc.
Why is it so?

P.S.: i`m new at ES, so do not judge strictly:))
Thank2all!

system · May 10, 2018, 12:25pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Search with stemming and stopwords (german) Elasticsearch	9	3489	July 6, 2017
Some help in the right direction :) Elasticsearch	2	410	January 4, 2020
ElasticSearch as a suggestion engine Elasticsearch	1	301	July 6, 2017
Even searching with elasticsearch I wasn't able to find a solution Elasticsearch	1	811	August 30, 2017
Help with Synonyms Elasticsearch	6	513	July 6, 2017

Relevance with short combinations of letters

Related topics