How can I solve turkish letter issue in elasticsearch by using C# nest?

Yusuf_Karatoprak · March 24, 2017, 1:15pm

0
down vote
favorite
In Turkey, we have Turkish letters like 'ğ', 'ü', 'ş', 'ı', 'ö', 'ç'. But when we search generally we use the letters 'g', 'u', 's', 'i', 'o', 'c'. This is not a rule but we generally do it, think like a habit, something we used to. Forexample if i write camelcase "Ş" it should be searched "ş" and "s". Look please this link it is the same thing. But Their solution is too long and not perfect. How can i below thing?

My goal is this:

ProductName or Category.CategoryName may contain Turkish letters ("Eşarp") or some may be mistyped and written with English letters ("Esarp") Querystring may contain Turkish letters ("eşarp") or not ("esarp") Querystring may have multiple words Every indexed string field should be searched against querystring (full-text search)

indexing and full text searching in elasticsearch without dialitics using c# client Nest

My Code is :

using Nest;

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace ElasticSearchTest2
{
class Program
{
public static Uri EsNode;
public static ConnectionSettings EsConfig;
public static ElasticClient client;
static void Main(string[] args)
{
EsNode = new Uri("http://localhost:9200/");
EsConfig = new ConnectionSettings(EsNode);
client = new ElasticClient(EsConfig);

        var partialName = new CustomAnalyzer
        {
            Filter = new List<string> { "lowercase", "name_ngrams", "standard", "asciifolding" },
            Tokenizer = "standard"
        };

        var fullName = new CustomAnalyzer
        {
            Filter = new List<string> { "standard", "lowercase", "asciifolding" },
            Tokenizer = "standard"
        };

        client.CreateIndex("indexname", c => c
                        .Analysis(descriptor => descriptor
                            .TokenFilters(bases => bases.Add("name_ngrams", new EdgeNGramTokenFilter
                            {
                                MaxGram = 20,
                                MinGram = 2,
                                Side = "front"
                            }))
                            .Analyzers(bases => bases
                                .Add("partial_name", partialName)
                                .Add("full_name", fullName))
                        )
                        .AddMapping<Employee>(m => m
                            .Properties(o => o
                                .String(i => i
                                    .Name(x => x.Name)
                                    .IndexAnalyzer("partial_name")
                                    .SearchAnalyzer("full_name")
                                ))));



        Employee emp = new Employee() { Name = "yılmaz", SurName = "eşarp" };
        client.Index<Employee>(emp, idx => idx.Index("employeeindex7"));
        Employee emp2 = new Employee() { Name = "ayşe", SurName = "eşarp" };
        client.Index<Employee>(emp2, idx => idx.Index("employeeindex7"));
        Employee emp3 = new Employee() { Name = "ömer", SurName = "eşarp" };
        client.Index<Employee>(emp3, idx => idx.Index("employeeindex7"));
        Employee emp4 = new Employee() { Name = "gazı", SurName = "emir" };
        client.Index<Employee>(emp4, idx => idx.Index("employeeindex7"));
    }
}

public class Employee
{

    public string Name { set; get; }
    public string SurName { set; get; }


}

}

My Search Query:

namespace Atom.Customer.Service

{
public class SearchCustomerService
: CustomerModuleServiceBase<SearchCustomerRequestEntity, SearchCustomerResponseEntity>
{
[MainMethod("1.0")]
public SearchCustomerResponseEntity MainMethodV1(SearchCustomerRequestEntity searchCustomerPhrase)
{
var searchCustomerResponse = new SearchCustomerResponseEntity();
var elasticClient = new ElasticSearchClient();
var client = elasticClient.GetClient("employeeindex7");
var searchResults = client.Search(s => s
.AllTypes()
.From(0)
.Size(10)
.Query(q => q.Bool(p => p.Must((m => m
.QueryString(qs => qs
.DefaultField("_all")
.Query(searchCustomerPhrase.Query))))))
);
searchCustomerResponse.TotalCount = searchResults.HitsMetaData.Total;
searchCustomerResponse.Customers = searchResults.Hits.Select(h => h.Source).ToList();
return searchCustomerResponse;
}
}
}

forloop · March 31, 2017, 7:03am

Take a look at

ASCII Folding Token Filter
ICU Folding Token Filter (part of the ICU Analysis Plugin).

Incorporating a token filter that performs character folding into an analyzer will solve your problem.

By default, a querystring query runs against the "_all" field, which contains the concatenation of all string fields within the document. You can change the field(s) that this query runs against, and can create your own _all-like fields with copy_to to have control over what fields get included.

system · April 28, 2017, 7:03am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.