ElasticSearch remove types from index

HilalB · December 19, 2017, 12:25pm

I have an index that has several types for languages. For example, school/english, school/german. I will update the elasticsearch so I should remove the types. The index per type is not necessary. I am confused whether I should use customField or join. Could you suggest any idea, please?

Arthur_Silva_Sens · December 19, 2017, 12:37pm

Do they have the same mapping?

If that's the case I would say that the best practice is to use a single index with an extra field "language"

HilalB · December 19, 2017, 12:47pm

It is not. They have same fields name, address but languages are different. For example:

school/english => field:name = "nice school"
school/german => field:name "schöne Schule"

Arthur_Silva_Sens · December 19, 2017, 1:02pm

If only the data is different, then it should't be a problem.

You could use one single index and in your application you search filtering by language
Let's say you have these documents in your index:

PUT /school
{
  "name":"nice school",
  "language":"english"
}

And

PUT /school
{
  "name":"schöne Schule"
  "language":"german"
}

If you search:

GET /school/_search
{
  "query": {
    "term": { 
      "language":"german"
    }
  }
}

You should get the document you want and nothing more

HilalB · December 19, 2017, 1:11pm

But I have a lot of fields like name.exact, name.ngram, name.edgegram and etc. If I add an extra field for each language name.english.exact, name.german.ngram, it would be an inefficient way.

Arthur_Silva_Sens · December 19, 2017, 3:27pm

No, there is no need for multiples fields. Just one single additional field "language".
Then you will have one document per language, where the document would have your data in the language you want and the extra field "language" working as a flag indicating which language your data is associated.

PUT /school
{
  "name.exact":"nice school",
  "name.ngram":"ngram value in english",
  "name.edgegram":"edgegram value  in english",
  "anotherfield_1":"value1  in english",
  "anotherfield_2":"value2  in english",
  
  ...

  "anotherfield_n":"valuen  in english",
  "language":"english"
}

The same with german or any other language:

PUT /school
{
  "name.exact":"schöne Schule",
  "name.ngram":"ngram value in german",
  "name.edgegram":"edgegram value  in german",
  "anotherfield_1":"value1  in german",
  "anotherfield_2":"value2  in german",
  
  ...

  "anotherfield_n":"valuen  in german",
  "language":"german"
}

Or are you saying that exact and ngram are the translation of each other? I don't know much german haha

If's thats the case, can't you use just english(Or whatever language you prefer) just for the mapping, and then use the necessary language for the values?

HilalB · December 19, 2017, 8:06pm

Is this "language" ElasticSearch attribute like analyzer?

How do you detect the language when query comes?(libraries are undeterministic because "die " is in both german and english")

One field name.exact could be analyzed with custom analyzer with stemmerTokenFilter which support multiple languages. Is it good option?

system · January 16, 2018, 8:20pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Best practices with localized indices Elasticsearch	3	4082	July 6, 2017
Removal of mapping types question Elasticsearch	4	352	April 19, 2019
Bets practice for indexing documents of various languages Elasticsearch	3	538	July 19, 2017
_type field filter vs. field filter performance Elasticsearch	4	806	March 2, 2017
Multilingual field handling with multiple fields in ES Elasticsearch	4	1883	July 6, 2017

ElasticSearch remove types from index

Related topics