That schema could be fine depending on the types of searches that are you
doing. Are you searching for a term in a particular language and want to
retrieve all translations? One possible variant that doesn't store all
terms twice would be this:
{
"en":{
"_id": "50922fca1e028a0200000010",
"translation": "use",
"score": "0",
"createDate": "2012-11-01T08:16:10.531Z",
"author": "anonymous"
},
"tr": {
"_id": "50922fca1e028a0200000010",
"translation": "kullanmak",
"score": "0",
"createDate": "2012-11-01T08:16:10.531Z",
"author": "anonymous"
},
"ar": {
"_id": "509568056a435ff40c000003",
"translation": "استخدم",
"score": "0",
"createDate": "2012-11-03T18:52:53.144Z",
"author": "anonymous"
},
"pt": {
"_id": "509568186a435ff40c00000b",
"translation": "usar",
"score": "0",
"createDate": "2012-11-03T18:53:12.834Z",
"author": "anonymous"
},
"ru": {
"_id": "5095682b6a435ff40c000017",
"translation": "использовать",
"score": "0",
"createDate": "2012-11-03T18:53:31.195Z",
"author": "anonymous"
},
"de": {
"_id": "5095683e6a435ff40c000027",
"translation": "verwenden",
"score": "0",
"createDate": "2012-11-03T18:53:50.812Z",
"author": "anonymous"
},
"fr": {
"_id": "509568516a435ff40c00003b",
"translation": "utiliser",
"score": "0",
"createDate": "2012-11-03T18:54:09.416Z",
"author": "anonymous"
}
}
Then if you need translation from English, you can search using
en.translation field, and if you need translation from Turkish, search
using tr.translation field. However, in this case you are loosing
some flexibility if you will ever need to capture a term that doesn't
translate back to the original term, like:
employ->kullanmak->use
On Saturday, November 3, 2012 3:05:52 PM UTC-4, cubuzoa wrote:
Here is my current index document;
{
"_index":"u7gpslhmzbop3usl",
"_type":"terms",
"_id":"50922fca1e028a020000000f",
"_score":3.2512918,
"_source":{
"lang":"en",
"term":"use",
"translations":[
{
"_id":"50922fca1e028a0200000010",
"lang":"tr",
"translation":"kullanmak",
"score":"0",
"createDate":"2012-11-01T08:16:10.531Z",
"author":"anonymous"
},
{
"_id":"509568056a435ff40c000003",
"lang":"ar",
"translation":"استخدم",
"score":"0",
"createDate":"2012-11-03T18:52:53.144Z",
"author":"anonymous"
},
{
"_id":"509568186a435ff40c00000b",
"lang":"pt",
"translation":"usar",
"score":"0",
"createDate":"2012-11-03T18:53:12.834Z",
"author":"anonymous"
},
{
"_id":"5095682b6a435ff40c000017",
"lang":"ru",
"translation":"использовать",
"score":"0",
"createDate":"2012-11-03T18:53:31.195Z",
"author":"anonymous"
},
{
"_id":"5095683e6a435ff40c000027",
"lang":"de",
"translation":"verwenden",
"score":"0",
"createDate":"2012-11-03T18:53:50.812Z",
"author":"anonymous"
},
{
"_id":"509568516a435ff40c00003b",
"lang":"fr",
"translation":"utiliser",
"score":"0",
"createDate":"2012-11-03T18:54:09.416Z",
"author":"anonymous"
}
],
"author":"anonymous",
"createDate":"2012-11-01T08:16:10.531Z"
}
}
For the term *use(en) *i have added 6 translations. When i added kullanmak(tr) as translation of term *use(en), *i create a new term lieke this;
{
"_index":"u7gpslhmzbop3usl",
"_type":"terms",
"_id":"50922fca1e028a0200000011",
"_score":3.2512918,
"_source":{
"lang":"tr",
"term":"kullanmak",
"translations":[
{
"_id":"50922fca1e028a0200000012",
"lang":"en",
"translation":"use",
"score":"0",
"createDate":"2012-11-01T08:16:10.549Z",
"author":"anonymous"
}
],
"author":"anonymous",
"createDate":"2012-11-01T08:16:10.549Z"
}
}
In other words, whenever i create a term and translation, i also create reverse translation. en->tr and tr->en for example. This doubles my document size.
When i search for "kullanmak" or "use", one result will be responded. In here, doubling documents is and disadvantage?
On Sat, Nov 3, 2012 at 5:48 PM, Igor Motov <imo...@gmail.com <javascript:>
wrote:
I am still not quite clear about how you are using these documents. Could
you explain the reasons for the hierarchical structure and why a document
can have 100 levels? What's the relationship between a translation on the
5th level, for example, and the root term? How is it different from the
translation on the 4th level? Can you just flatten all translations into an
array, or you will loose some information if you will do it this way? It's
typically better to flatten the document and there are many way to do it,
but without knowing your use case, it's difficult to give a
good recommendation here. Would something like work for you, for example?
{
"translation.cs":"ananasovník",
"translation.de":"ananas",
"translation.en":"pineapple",
"translation.es":"ananas",
"translation.he":"אננס",
"translation.nl":"ananas",
"translation.ru":"ананас",
"translation.tr":"ananas"
}
On Saturday, November 3, 2012 2:57:21 AM UTC-4, cubuzoa wrote:
Yes, that is exactly as you anderstood. There can be n level in one
document.For some terms, there can be 100 level depth document. For this
reason, i used 2 level separate documents like my example above.
On Fri, Nov 2, 2012 at 6:04 PM, Igor Motov imo...@gmail.com wrote:
If I understood your question correctly, you are trying to compare
indexing of one large document with 10 subdocuments embedded into it to
indexing these 10 documents as separate small documents. To answer this
question, I need to know a little bit more about what types of queries you
are going to make against this index and what types of subdocuments you are
going to store. Are they going to look like documents in your first post?
How many levels are you expecting to have in a single document?
On Friday, November 2, 2012 10:28:06 AM UTC-4, cubuzoa wrote:
Thanks for your reply, it works! May i ask you a related question?
Let's say i have two index of same size I-1 and I-2 . I-1 has 10 documents
and I-2 has one document . I-2 has less document because it is embedded. Is
there any disadvantage of bigger index count on elasticsearch world?
Thanks in advance
On Fri, Nov 2, 2012 at 3:34 PM, Igor Motov imo...@gmail.com wrote:
You can use Query Stringhttp://www.elasticsearch.org/guide/reference/query-dsl/query-string-query.htmlor Multi
Matchhttp://www.elasticsearch.org/guide/reference/query-dsl/multi-match-query.htmlqueries. For example:
{
"query": {
"query_string": {
"query": "someterm",
"fields": ["*.translation"]
}
}
}
or
{
"query": {
"multi_match": {
"query": "someterm",
"fields": ["*.translation"]
}
}
}
On Friday, November 2, 2012 3:02:59 AM UTC-4, cubuzoa wrote:
I have a index document structure like below;
{
"term":"some term",
"inlang":"some lang"
"translations" : {
{
"translation":"some translation",
"outlang":"some lang",
"translations" : {
{
"translation":"some translation 1"
"outlang": "some lang 1"
"translations" : {...}
}
}
},
...
}
}
I want to find a translation in such documents. However, this
translation can exists at any level of this document. Is it possible to
search term dynamically by using elasticsearch?
For example,
{
"query": {
"*.translation":"searchterm"
}
}
Thanks in advance
--
--
--
--