How to model data?

hym · May 29, 2018, 1:22pm

Hi, I have an index in which I am trying to have some fields translated in different languages:
which approach you think is better:

first approach:
{
"field1": {
"en": "book",
"de": "buch",
"fr": "livre"
},
"field2": {
"en": "water",
"de": "wasser",
"fr": "eau"
},
"field3": {
"en": "head",
"de": "kopf",
"fr": "tête"
}
}

Or
Second approach:
{
"en": {
"field1": "book"
"field2": "water"
"field3": "head"
},
"de": {
"field1": "buch"
"field2": "wasser"
"field3": "kopf"
},
"fr": {
"field1": "livre"
"field2": "eau"
"field3": "tête"
}
}

Which one is better and why?

Thanks in advance

JKhondhu · May 29, 2018, 5:30pm

@hym
How do you intend to serve the results to your users?

Are you wanting to create a synonyms search whereby one user searches for auto IT and that is able to find results for car EN, coche ES?

hym · May 30, 2018, 9:04am

@JKhondhu, Thanks for replay;
No in this case I am not thinking of using synonyms because I already provided synonyms per each analyzer of different languages.
My question is more from data model point of view, it is just like project structures nowadays is a question,
by-feature or by-type where in our case the first approach is more like by-feature and the second approach is like by-type

Please also tell me more what goes in your mind, maybe I did not get your hint and question
Thanks in advance

JKhondhu · May 30, 2018, 3:47pm

How do you intend to serve the results to your users?
How do you presume your data to be queried? By feature or type? Then we can work backwards.

hym · May 30, 2018, 4:39pm

@JKhondhu, In my queries, I will always have locale (en, de, ...) here we define it type, for example, let's assume a request comes in and it has the locale set to en, then the query clauses will target fields for en.
please tell me some also what you have in mind for queries by feature?
and one more question, if we define data model by type we will have all fields (field1, field2, field3) several times defined with the same name in the mapping (same index) but with different analyzers, it could be a problem?

Thanks

JKhondhu · May 31, 2018, 10:20am

@hym,

"en": {
"field1": "book"
"field2": "water"
"field3": "head"
.
.

This schema is much easier for search. In a multimatch you can use the wildcard en.*. So the recommendation would be to do this by language, absolutely.

system · June 28, 2018, 10:20am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.