ElasticSearch remove types from index

I have an index that has several types for languages. For example, school/english, school/german. I will update the elasticsearch so I should remove the types. The index per type is not necessary. I am confused whether I should use customField or join. Could you suggest any idea, please?

Do they have the same mapping?

If that's the case I would say that the best practice is to use a single index with an extra field "language"

It is not. They have same fields name, address but languages are different. For example:

school/english => field:name = "nice school"
school/german => field:name "schöne Schule"

If only the data is different, then it should't be a problem.

You could use one single index and in your application you search filtering by language
Let's say you have these documents in your index:

PUT /school
{
  "name":"nice school",
  "language":"english"
}

And

PUT /school
{
  "name":"schöne Schule"
  "language":"german"
}

If you search:

GET /school/_search
{
  "query": {
    "term": { 
      "language":"german"
    }
  }
}

You should get the document you want and nothing more

But I have a lot of fields like name.exact, name.ngram, name.edgegram and etc. If I add an extra field for each language name.english.exact, name.german.ngram, it would be an inefficient way.

No, there is no need for multiples fields. Just one single additional field "language".
Then you will have one document per language, where the document would have your data in the language you want and the extra field "language" working as a flag indicating which language your data is associated.

PUT /school
{
  "name.exact":"nice school",
  "name.ngram":"ngram value in english",
  "name.edgegram":"edgegram value  in english",
  "anotherfield_1":"value1  in english",
  "anotherfield_2":"value2  in english",
  
  ...

  "anotherfield_n":"valuen  in english",
  "language":"english"
}

The same with german or any other language:

PUT /school
{
  "name.exact":"schöne Schule",
  "name.ngram":"ngram value in german",
  "name.edgegram":"edgegram value  in german",
  "anotherfield_1":"value1  in german",
  "anotherfield_2":"value2  in german",
  
  ...

  "anotherfield_n":"valuen  in german",
  "language":"german"
}

Or are you saying that exact and ngram are the translation of each other? I don't know much german haha

If's thats the case, can't you use just english(Or whatever language you prefer) just for the mapping, and then use the necessary language for the values?

Is this "language" ElasticSearch attribute like analyzer?

How do you detect the language when query comes?(libraries are undeterministic because "die " is in both german and english")

One field name.exact could be analyzed with custom analyzer with stemmerTokenFilter which support multiple languages. Is it good option?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.