I don't understand. What can you do?
There are two options for me:
For that last item, how can you proceed? Random analyzer? Try more than one analyzer? No analyzer at all (yes you may want to use ngrams to try to make it work but I guess that it will generate a lot of false positive results).
More than one analyzer is multifield. You're right. Behind the scene, it's like having X fields one for each language. But instead of providing a json document like:
{
"content_fr":"mon contenu francais",
"content_en":"mon contenu francais",
"content_de":"mon contenu francais"
}
You will be able to provide
{
"content":"mon contenu francais",
}
So in term of _source storage, you won't pay the price 3 time. In term of inverted index, yes you will consume space for content.fr, content.en and content.de.
Makes sense?
--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr
8 novembre 2013 at 19:22:55, Paweł Młynarczyk (zwarios@gmail.com) a écrit:
Thanks for your reply.
You suggest creating a multi field equivalent of the '_all' field, but isn't that a waste to analyze all the language dependant data with every analyzer? I mean if I would create that kind of custom '_all' field and put there aggregated data from all the language dependant fields, than I would end up having X '_all' fields (where X is the number of languages) right? Additionaly would I have any option to boost a particular, more important field? (In my case, every language have more than 1 field and some of them are more important)
Pawe³ M³ynarczyk
W dniu pi±tek, 8 listopada 2013 17:51:41 UTC+1 u¿ytkownik David Pilato napisa³:
I would not use _all field for that but I would probably disable it and the use multifield type on your "content" field.
Probably one sub field per language.
See: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-all-field.html
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-multi-field-type.html
HTH
--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr
8 novembre 2013 at 16:40:23, Pawe³ M³ynarczyk (zwa...@gmail.com) a écrit:
Hi
I've got a multilingual documents to index. I want to create a full text search, so the first thing on my mind was to use string query with the _all field. The problem is that the _all field has it's own analyzer, so the fields specific analyzers are not used (data is not analyzed properly). Is there a way to use field's aproppriate analyzer when copying data to _all instead of just reanalyzing it with the _all's analyzer?
Creating separate index for each language is not a good solution for my case, because I've got milions of documents and every one of them contains fields in more than one language + a number of language independent fields. That means I would end up having heavily duplicated data in every index.
Thanks in advance
Pawe³ M³ynarczyk
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.