Greek tones problem

Hello! I am using elasticsearch mainly for greek words searching. My main problem is that i can't find a solution in greek accents when I make a search.For example when I search the word παιδια I want to get back παιδια and παιδιά or when I am writing παιδιά to get back παιδιά and παιδια.These must be implemented for
α->ά
ο->ό
η->ή etc.
Any suggestions?
Thank you in advance,
Gregory

I hope the synonym might work with your situation

1 Like

Can I use synonym for characters or only for words?

I used as word synonym haven't use as characters sysnonym

https://www.elastic.co/guide/en/elasticsearch/reference/5.0/analysis-mapping-charfilter.html

Thank you, I have already use it and doesn't help.

how about tokenzier? did you used as well?

I use something like this

{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "keyword",
"char_filter": [
"my_char_filter"
]
}
},
"char_filter": {
"my_char_filter": {
"type": "mapping",
"mappings": [
"α => α",
"α => ά"
]
}
}
}
}
}

but this code is false.I can't add two mappings in same character.

Hi @Grigorios_Loukidis,

any reason why you don't use the built-in greek analyzer (see docs)?

When I test it with "παιδιά":

POST _analyze
{
    "text": "παιδιά",
    "analyzer": "greek"
}

I get:

{
   "tokens": [
      {
         "token": "απαιδ",
         "start_offset": 0,
         "end_offset": 7,
         "type": "<ALPHANUM>",
         "position": 0
      }
   ]
}

I also tried with "άπαδεία" (to check what happens with leading accents):

POST _analyze
{
    "text": "άνθρωπος",
    "analyzer": "greek"
}

And this returns:

{
   "tokens": [
      {
         "token": "ανθρωπ",
         "start_offset": 0,
         "end_offset": 8,
         "type": "<ALPHANUM>",
         "position": 0
      }
   ]
}

If that doesn't suit your needs you could check the asciifolding token filter.

Daniel

P.S.: No, I am not fluent in Greek. :slight_smile:

Daniel, when I text παιδια I need to get back firstly παιδια and secondly παιδιά
Also, when I text παιδιά I need to get back firstly παιδιά and secondly παιδια.
We dont care about leading accents, its the same :slight_smile:

Hi @Grigorios_Loukidis,

just to ensure we're on the same page: The analyzer is used both during index time and search time but you have to search in the correct field. So here is a complete example of what I am talking about:

DELETE my_test_index

PUT my_test_index
{
   "mappings": {
      "some_type": {
         "properties": {
            "content": {
               "type": "text",
               "analyzer": "greek"
            }
         }
      }
   }
}

POST /my_test_index/some_type
{
    "content": "παιδιά"
}

POST /my_test_index/some_type
{
    "content": "άνθρωπος"
}

GET /my_test_index/some_type/_search
{
    "query": {
        "match": {
           "content": "παιδια"
        }
    }
}

This returns:

{
   "took": 2,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 0.2876821,
      "hits": [
         {
            "_index": "my_test_index",
            "_type": "some_type",
            "_id": "AVqD7lSD0LG3_Fpe5Jbj",
            "_score": 0.2876821,
            "_source": {
               "content": "παιδιά"
            }
         }
      ]
   }
}

Isn't that what you're after?

Daniel

I will try it and respond you back. Thank you for your help and patience. :slight_smile:

I think you are right in your example, but I find difficulties using nest in c#

var response = client1.CreateIndex("thetrialindex", s => s.Settings(s1 => s1.NumberOfShards(5)
.NumberOfReplicas(5)
).Mappings(m=>m.Map("mytype",mt=>mt.Properties(c=>c.Text(c1=>c1
.Name("ENTITY_DESC").Analyzer("greek"))))));

Hi @Grigorios_Loukidis,

my level of C# expertise matches my level of expertise of the Greek language so bear with me. :slight_smile:

Can you please describe the exact problem with the snippet? Does this code not do what you want or doesn't it compile?

Daniel

I will try to convert it to c# and test it, then I will respond you back.

It returns me no results. :smirk:

Hi @Grigorios_Loukidis,

ok, this specific statement will return no results because it actually creates the index.

You can check whether it did what you intended by using the REST API.

If you use Console (which you should; it makes it much easier to explore data in Elasticsearch):

GET /thetrialindex/_mapping

Or in this specific case you can also open http://localhost:9200/thetrialindex/_mapping in your browser.

The next step is then to index some documents and finally to issue a query that targets this field. Something along these lines:

s => s
.Query(q => q
    .Match(p => p.ENTITY_DESC, "παιδιά")
)

(example based on the docs)

Daniel

A huge thank to you. It worked after two weeks of hard testing.

Hi @Grigorios_Loukidis,

you are welcome. I am very glad to hear that I could help you and it is is now working as expected. :slight_smile:

Daniel

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.