Sorry for the delay in getting back to both of you on this.
First off, you have to create your index mapping for what to do when you
get a certain language in:
curl -XPUT 'http://localhost:9200/data/_settings' -d '
{
"settings": {
"analysis": {
"analyzer": {
"ar": {
"type":"arabic"
},
"hy": {
"type":"armenian"
},
"eu": {
"type":"basque"
}
.....
This is setting up the data index to have 3 analysers - when it gets an
input of "ar", it analyses using the "arabic" analyzer, when it gets "hy",
it uses the "armenian" analyzer and so on...
I then need to specify in my index type which field i was to use as
analyzer input:
I do this when specifying the type mapping:
curl -XPUT 'http://localhost:9200/data/data_language/_mapping' -d '{
"data_language":{
"_analyzer":{
"path":"language"
},
...,
"properties":{
...,
"language":{
"type":"string",
"index":"not_analyzed"
},
...
}
}
}'
So when a document is put into ES for the above type with a language like
"ar", the system automatically uses the "arabian" analyzer.
The main issue with this is that once the content is inserted and analyzed,
that's it; there is no changing the analyzer language later. The advantage
is that you can search over this data in either a language ambiguous way;
or else you can specify the locale/language you want to search in and only
get results in your language...
Like anything, it is all based on your workflow - test out the different
ways and figure out which works best
D
On Thursday, 24 January 2013 11:06:36 UTC, Sapana Patel wrote:
Hi,
Hi,
I am also having same requirement in my project.
So I agree with your point 4.
Have you tried this ? Is it work for you?
Actually I tried with JAVA API but not able to do this..
So if you done this can you please guide me and provide sample code part
to do this with Java API
Thanks
--
Regards
Sapana Patel
On Wednesday, November 28, 2012 9:28:41 PM UTC+5:30, Derry O' Sullivan
wrote:
Hi all,
We have a document index which has a number of fields such as 'title',
'description', 'searches' etc. These can all be provided in the input from
any locale e.g. i could put in these 3 fields in english and some one else
add them in french etc.
I know there have been lots of posts on the group regarding:
- Setting an index analyzer in advance to analyze 'all content' (won't
work with multiple languages on input)
- Using multiple indexes - 1 for each language (would prefer not to have
to do this from an index/alias maintenance point of view)
- Using multiple fields (for each language) and analyzing per field
(e.g. have a field for title-fr, title-de, title-en and separate the data
with a specific field analyzer for each field). This would have the
overhead of having to explicitly create the mapping for a number of fields
- the number of languages you want to support
- Using single fields (all languages in one) and analyzing per
language(e.g. have 1 field and then just set the analyzer on indexing that
piece of content). This seems like the cleanest solution but i'm wondering
if there is any search/indexing issue with having multi-lingual terms
within 1 field.
- Multi-field values (similar to point 3) but explicitly using
multi-field instead of multiple separated fields.
I'm leaning towards number 4 but would appreciate any feedback people
would have from experience,
Thanks,
Derry
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.