I am using Elasticsearch for a search jobs. Jobs have Title, Description, company and Location, as shown below:
I am using english_analyzer for Title and Description and keyword_analizer for company and location, as shown below (I am using NEST):
[ElasticsearchType(Name = "jobdocument")]
public class JobDocument
// By convention, Id property will become the Id of elastic search document, Id is mapped to AdBaseId
public long Id { get; set; }
[Keyword(Normalizer = "custom_ignore_case_normalizer")] // keywords => not analyzed, use ignore case normalizer otherwise search would be case sensitive
public string CompanyName { get; set; }
[Text(Analyzer = "custom_english_analyzer", SearchAnalyzer = "custom_english_analyzer")]
public string Title { get; set; }
[Text(Analyzer = "custom_english_analyzer", SearchAnalyzer = "custom_english_analyzer")]
public string Description { get; set; }
[Keyword(Normalizer = "custom_ignore_case_normalizer")] // keywords => not analyzed, use ignore case normalizer otherwise search would be case sensitive
public string Locality { get; set; }
And here is the Index creation code:
var createIndexResponse = ElasticClient.CreateIndex(IndexName, c => c
.Settings(st => st
.Analysis(an => an
.Analyzers(anz => anz
.Custom("result_suggester_analyzer", rsa => rsa
.CharFilters("html_strip") // put synonyms_token_filter after lowercase
.Filters(new string[] { "english_possessive_stemmer", "lowercase", "synonyms_token_filter", "asciifolding", "stop_words", "english_stemmer", "edge_ngram_token_filter", "unique" })
.Custom("custom_english_analyzer", ce => ce
.Filters(new string[] { "english_possessive_stemmer", "lowercase", "synonyms_token_filter", "asciifolding", "stop_words", "english_stemmer", "unique" })
.Normalizers(nor => nor
.Custom("custom_ignore_case_normalizer", icn => icn
.Filters(new string[] { "lowercase", "asciifolding" })
.TokenFilters(tfd => tfd
.EdgeNGram("edge_ngram_token_filter", engd => engd
.Stop("stop_words", sfd => sfd.StopWords(_stopWords))
.Stemmer("english_stemmer", esd => esd.Language("english"))
.Stemmer("english_possessive_stemmer", epsd => epsd.Language("possessive_english"))
.Synonym("synonyms_token_filter", s => s.Synonyms(Synonym.List)) // SynonymsPath
.Mappings(m => m.Map<JobDocument>(d => d.AutoMap())));
When I search for terms like: "C#" or ".net" they cannot be found... I assume the reason is English language analyzer would get rid of special characters?
But if I use a keyword_analyzer, then terms like 'programmer' and 'programming' would be treated differently...
How would I address this situation? Does it make sense to have 2 analyzer for Title and Keyword?