Synonym multi words search

hi,

i am using elastic search 0.20.5 version.

I have mapped in synonym file as follow

abc,oracle america
xyz,abc
xyz,eloqua automation

and my analyzer mapping as following

{ \n"
+ " "index" : {\n"
+ ""analysis" : {\n"
+ ""analyzer" : {\n"
+ " "mainindexanalyzer" : {\n"
+ ""type":"custom",\n" //whitespace standard
+ ""tokenizer" : "standard",\n" //,"myshingle"
,"mystopword" "length", "length",["lowercase"
,"asciifolding","myworddelimiter","my_snowball"]
+ ""filter" :
["lowercase","asciifolding","length","mystopword","my_snowball","mystemmer","myshingle","mysynonym"],\n"
+ ""char_filter" :["html_strip"]\n"
+ " },\n"
+ ""mainsearchanalyzer" : {\n"
+ ""type":"custom",\n"
+ " "tokenizer" :
"whitespace",\n"//["lowercase","asciifolding","mystopword","mysynonym","myworddelimiter","my_snowball"]
+ ""filter" :
["lowercase","asciifolding","mystopword","mystemmer","myshingle","mysynonym","myworddelimiter"],\n"
+ ""char_filter" :["html_strip"]\n"
+ "}\n"
+ "},\n"
+ ""filter" : {\n"
+ ""mystemmer":{\n"
+ " "type" : "stemmer",\n"
+ ""name" :"english"\n"
+ "},\n"
+ ""my_snowball" : {\n"
+ ""type" : "snowball",\n"
+ ""language" : "English"\n"
+ "},\n"
+ ""mystopword": {\n"
+ " "type" : "stop",\n"
+ ""stopwords_path" :"" + stopwordfilepath + "" ,\n"
//+ ""stopwords_path"
:"F:/resources/stopwordeng.txt" ,\n"
+ ""ignore_case":true\n"
+ "},\n"
+ ""myshingle":{\n"
+ " "type" : "shingle",\n"
+ ""max_shingle_size" :100,\n"
+ ""min_shingle_size":2,\n"
+ ""output_unigrams":true \n"
+ "},\n"
+ ""mysynonym": {\n"
+ " "type" : "synonym",\n"
+ ""synonyms_path" :"" + synonymfilepaths_to_p + ""
,\n"
+ ""ignore_case":true,\n"
+ ""expand":true\n"
+ "},\n"
+ ""myworddelimiter":{\n"
+ " "type" : "word_delimiter",\n"
+ ""generate_word_parts" :true ,\n"
+ ""generate_number_parts" :true ,\n"
+ ""catenate_words" :true ,\n"
+ ""catenate_numbers" :false ,\n"
+ ""catenate_all" :true ,\n"
+ ""split_on_case_change" :true ,\n"
+ ""preserve_original" :true ,\n"
+ ""split_on_numerics":true ,\n"
+ ""stem_english_possessive":true,\n"
+ ""protected_words_path " : "" +
protectedwordfilepath + "",\n"
+ ""type_table_path " : "" + typetablefilepath +
""\n"
+ "}\n"
+ "}\n"
+ "}\n"
+ "}\n"
+ "}";

and my index are following.
{id=4, Description=abc}
{id=5, Description=xyz}
{id=1, Description=Eloqua Automation}
{id=3, Description=Oracle america}

when I search using "oracle america" i get 'abc' and 'oracle america'
it's ok. but when searching using 'oracle' i got only 'oracle america'
it's also ok.
but I search using 'america' I got "oracle america" and "*abc" *both.
why both value are come when using america it my doubt.what it wrong.

Please help.

Thanks
Rajesh

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I find the following approach effective if you are doing multi-word
synonyms (synonym phrases):

  • Only apply the synonym expansion at index time
  • Don't have the synonym filter applied search
  • Use directional synonyms where appropriate. You want to make sure that
    you're not injecting terms that are too general.

For example, you probably want:
oracle america => abc

Otherwise, a more general term "america" will get injected when you see
something specific.

If you provide a reproducible curl based gist with your current and
expected behavior, I could provide more details.

Best Regards,
Paul

On Friday, March 1, 2013 5:02:40 AM UTC-7, raj wrote:

hi,

i am using Elasticsearch 0.20.5 version.

I have mapped in synonym file as follow

abc,oracle america
xyz,abc
xyz,eloqua automation

and my analyzer mapping as following

{ \n"
+ " "index" : {\n"
+ ""analysis" : {\n"
+ ""analyzer" : {\n"
+ " "mainindexanalyzer" : {\n"
+ ""type":"custom",\n" //whitespace standard
+ ""tokenizer" : "standard",\n" //,"myshingle"
,"mystopword" "length", "length",["lowercase"
,"asciifolding","myworddelimiter","my_snowball"]
+ ""filter" :
["lowercase","asciifolding","length","mystopword","my_snowball","mystemmer","myshingle","mysynonym"],\n"
+ ""char_filter" :["html_strip"]\n"
+ " },\n"
+ ""mainsearchanalyzer" : {\n"
+ ""type":"custom",\n"
+ " "tokenizer" :
"whitespace",\n"//["lowercase","asciifolding","mystopword","mysynonym","myworddelimiter","my_snowball"]
+ ""filter" :
["lowercase","asciifolding","mystopword","mystemmer","myshingle","mysynonym","myworddelimiter"],\n"
+ ""char_filter" :["html_strip"]\n"
+ "}\n"
+ "},\n"
+ ""filter" : {\n"
+ ""mystemmer":{\n"
+ " "type" : "stemmer",\n"
+ ""name" :"english"\n"
+ "},\n"
+ ""my_snowball" : {\n"
+ ""type" : "snowball",\n"
+ ""language" : "English"\n"
+ "},\n"
+ ""mystopword": {\n"
+ " "type" : "stop",\n"
+ ""stopwords_path" :"" + stopwordfilepath + ""
,\n"
//+ ""stopwords_path"
:"F:/resources/stopwordeng.txt" ,\n"
+ ""ignore_case":true\n"
+ "},\n"
+ ""myshingle":{\n"
+ " "type" : "shingle",\n"
+ ""max_shingle_size" :100,\n"
+ ""min_shingle_size":2,\n"
+ ""output_unigrams":true \n"
+ "},\n"
+ ""mysynonym": {\n"
+ " "type" : "synonym",\n"
+ ""synonyms_path" :"" + synonymfilepaths_to_p +
"" ,\n"
+ ""ignore_case":true,\n"
+ ""expand":true\n"
+ "},\n"
+ ""myworddelimiter":{\n"
+ " "type" : "word_delimiter",\n"
+ ""generate_word_parts" :true ,\n"
+ ""generate_number_parts" :true ,\n"
+ ""catenate_words" :true ,\n"
+ ""catenate_numbers" :false ,\n"
+ ""catenate_all" :true ,\n"
+ ""split_on_case_change" :true ,\n"
+ ""preserve_original" :true ,\n"
+ ""split_on_numerics":true ,\n"
+ ""stem_english_possessive":true,\n"
+ ""protected_words_path " : "" +
protectedwordfilepath + "",\n"
+ ""type_table_path " : "" + typetablefilepath +
""\n"
+ "}\n"
+ "}\n"
+ "}\n"
+ "}\n"
+ "}";

and my index are following.
{id=4, Description=abc}
{id=5, Description=xyz}
{id=1, Description=Eloqua Automation}
{id=3, Description=Oracle america}

when I search using "oracle america" i get 'abc' and 'oracle america
'
it's ok. but when searching using 'oracle' i got only 'oracle america
'
it's also ok.
but I search using 'america' I got "oracle america" and "*abc" *both.
why both value are come when using america it my doubt.what it wrong.

Please help.

Thanks
Rajesh

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

On Friday, March 1, 2013 5:32:40 PM UTC+5:30, raj wrote:

hi,

i am using Elasticsearch 0.20.5 version.

I have mapped in synonym file as follow

abc,oracle america
xyz,abc
xyz,eloqua automation

and my analyzer mapping as following

{ \n"
+ " "index" : {\n"
+ ""analysis" : {\n"
+ ""analyzer" : {\n"
+ " "mainindexanalyzer" : {\n"
+ ""type":"custom",\n" //whitespace standard
+ ""tokenizer" : "standard",\n" //,"myshingle"
,"mystopword" "length", "length",["lowercase"
,"asciifolding","myworddelimiter","my_snowball"]
+ ""filter" :
["lowercase","asciifolding","length","mystopword","my_snowball","mystemmer","myshingle","mysynonym"],\n"
+ ""char_filter" :["html_strip"]\n"
+ " },\n"
+ ""mainsearchanalyzer" : {\n"
+ ""type":"custom",\n"
+ " "tokenizer" :
"whitespace",\n"//["lowercase","asciifolding","mystopword","mysynonym","myworddelimiter","my_snowball"]
+ ""filter" :
["lowercase","asciifolding","mystopword","mystemmer","myshingle","mysynonym","myworddelimiter"],\n"
+ ""char_filter" :["html_strip"]\n"
+ "}\n"
+ "},\n"
+ ""filter" : {\n"
+ ""mystemmer":{\n"
+ " "type" : "stemmer",\n"
+ ""name" :"english"\n"
+ "},\n"
+ ""my_snowball" : {\n"
+ ""type" : "snowball",\n"
+ ""language" : "English"\n"
+ "},\n"
+ ""mystopword": {\n"
+ " "type" : "stop",\n"
+ ""stopwords_path" :"" + stopwordfilepath + ""
,\n"
//+ ""stopwords_path"
:"F:/resources/stopwordeng.txt" ,\n"
+ ""ignore_case":true\n"
+ "},\n"
+ ""myshingle":{\n"
+ " "type" : "shingle",\n"
+ ""max_shingle_size" :100,\n"
+ ""min_shingle_size":2,\n"
+ ""output_unigrams":true \n"
+ "},\n"
+ ""mysynonym": {\n"
+ " "type" : "synonym",\n"
+ ""synonyms_path" :"" + synonymfilepaths_to_p +
"" ,\n"
+ ""ignore_case":true,\n"
+ ""expand":true\n"
+ "},\n"
+ ""myworddelimiter":{\n"
+ " "type" : "word_delimiter",\n"
+ ""generate_word_parts" :true ,\n"
+ ""generate_number_parts" :true ,\n"
+ ""catenate_words" :true ,\n"
+ ""catenate_numbers" :false ,\n"
+ ""catenate_all" :true ,\n"
+ ""split_on_case_change" :true ,\n"
+ ""preserve_original" :true ,\n"
+ ""split_on_numerics":true ,\n"
+ ""stem_english_possessive":true,\n"
+ ""protected_words_path " : "" +
protectedwordfilepath + "",\n"
+ ""type_table_path " : "" + typetablefilepath +
""\n"
+ "}\n"
+ "}\n"
+ "}\n"
+ "}\n"
+ "}";

and my index are following.
{id=4, Description=abc}
{id=5, Description=xyz}
{id=1, Description=Eloqua Automation}
{id=3, Description=Oracle america}

when I search using "oracle america" i get 'abc' and 'oracle america
'
it's ok. but when searching using 'oracle' i got only 'oracle america
'
it's also ok.
but I search using 'america' I got "oracle america" and "*abc" *both.
why both value are come when using america it my doubt.what it wrong.

Please help.

Thanks
Rajesh

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

hi Paul,

Thankyou for reply,
I am new for Elasticsearch.
please may you explain "synonym expansion at index time"
and if I using "directional synonyms" then I not get appropriated result.
For example
when I using this
oracle america => abc
and search using "oracle america" I got only "oracle america"
i can't got abc.

"abc" is available in index document.
please help.

Thankyou
Rajesh

On Saturday, March 2, 2013 5:04:31 AM UTC+5:30, ppearcy wrote:

I find the following approach effective if you are doing multi-word
synonyms (synonym phrases):

  • Only apply the synonym expansion at index time
  • Don't have the synonym filter applied search
  • Use directional synonyms where appropriate. You want to make sure that
    you're not injecting terms that are too general.

For example, you probably want:
oracle america => abc

Otherwise, a more general term "america" will get injected when you see
something specific.

If you provide a reproducible curl based gist with your current and
expected behavior, I could provide more details.

Best Regards,
Paul

On Friday, March 1, 2013 5:02:40 AM UTC-7, raj wrote:

hi,

i am using Elasticsearch 0.20.5 version.

I have mapped in synonym file as follow

abc,oracle america
xyz,abc
xyz,eloqua automation

and my analyzer mapping as following

{ \n"
+ " "index" : {\n"
+ ""analysis" : {\n"
+ ""analyzer" : {\n"
+ " "mainindexanalyzer" : {\n"
+ ""type":"custom",\n" //whitespace standard
+ ""tokenizer" : "standard",\n" //,"myshingle"
,"mystopword" "length", "length",["lowercase"
,"asciifolding","myworddelimiter","my_snowball"]
+ ""filter" :
["lowercase","asciifolding","length","mystopword","my_snowball","mystemmer","myshingle","mysynonym"],\n"
+ ""char_filter" :["html_strip"]\n"
+ " },\n"
+ ""mainsearchanalyzer" : {\n"
+ ""type":"custom",\n"
+ " "tokenizer" :
"whitespace",\n"//["lowercase","asciifolding","mystopword","mysynonym","myworddelimiter","my_snowball"]
+ ""filter" :
["lowercase","asciifolding","mystopword","mystemmer","myshingle","mysynonym","myworddelimiter"],\n"
+ ""char_filter" :["html_strip"]\n"
+ "}\n"
+ "},\n"
+ ""filter" : {\n"
+ ""mystemmer":{\n"
+ " "type" : "stemmer",\n"
+ ""name" :"english"\n"
+ "},\n"
+ ""my_snowball" : {\n"
+ ""type" : "snowball",\n"
+ ""language" : "English"\n"
+ "},\n"
+ ""mystopword": {\n"
+ " "type" : "stop",\n"
+ ""stopwords_path" :"" + stopwordfilepath + ""
,\n"
//+ ""stopwords_path"
:"F:/resources/stopwordeng.txt" ,\n"
+ ""ignore_case":true\n"
+ "},\n"
+ ""myshingle":{\n"
+ " "type" : "shingle",\n"
+ ""max_shingle_size" :100,\n"
+ ""min_shingle_size":2,\n"
+ ""output_unigrams":true \n"
+ "},\n"
+ ""mysynonym": {\n"
+ " "type" : "synonym",\n"
+ ""synonyms_path" :"" + synonymfilepaths_to_p +
"" ,\n"
+ ""ignore_case":true,\n"
+ ""expand":true\n"
+ "},\n"
+ ""myworddelimiter":{\n"
+ " "type" : "word_delimiter",\n"
+ ""generate_word_parts" :true ,\n"
+ ""generate_number_parts" :true ,\n"
+ ""catenate_words" :true ,\n"
+ ""catenate_numbers" :false ,\n"
+ ""catenate_all" :true ,\n"
+ ""split_on_case_change" :true ,\n"
+ ""preserve_original" :true ,\n"
+ ""split_on_numerics":true ,\n"
+ ""stem_english_possessive":true,\n"
+ ""protected_words_path " : "" +
protectedwordfilepath + "",\n"
+ ""type_table_path " : "" + typetablefilepath +
""\n"
+ "}\n"
+ "}\n"
+ "}\n"
+ "}\n"
+ "}";

and my index are following.
{id=4, Description=abc}
{id=5, Description=xyz}
{id=1, Description=Eloqua Automation}
{id=3, Description=Oracle america}

when I search using "oracle america" i get 'abc' and '*oracle america
*'
it's ok. but when searching using 'oracle' i got only '*oracle america
*'
it's also ok.
but I search using 'america' I got "oracle america" and "*abc" *both.
why both value are come when using america it my doubt.what it wrong.

Please help.

Thanks
Rajesh

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

For a specific field's mappings, you have the ability to specify a
index_analyzer and a search_analyzer. For, index time synonym expansion,
you include the synonym token filter in the index analyzer and not the
search analyzer.

Keep in mind the search engine is term based. So, if your synonym list is:
abc, oracle america

Here is what happens to various data in the document:
abc -> abc oracle america
oracle america -> oracle america abc
oracle -> oracle
america -> america

Play around with index time synonym expansion and let us know if you have
more specific questions and recreations of examples.

Thanks,
Paul

On Monday, March 4, 2013 1:51:53 AM UTC-7, raj wrote:

hi Paul,

Thankyou for reply,
I am new for Elasticsearch.
please may you explain "synonym expansion at index time"
and if I using "directional synonyms" then I not get appropriated result.
For example
when I using this
oracle america => abc
and search using "oracle america" I got only "oracle america"
i can't got abc.

"abc" is available in index document.
please help.

Thankyou
Rajesh

On Saturday, March 2, 2013 5:04:31 AM UTC+5:30, ppearcy wrote:

I find the following approach effective if you are doing multi-word
synonyms (synonym phrases):

  • Only apply the synonym expansion at index time
  • Don't have the synonym filter applied search
  • Use directional synonyms where appropriate. You want to make sure that
    you're not injecting terms that are too general.

For example, you probably want:
oracle america => abc

Otherwise, a more general term "america" will get injected when you see
something specific.

If you provide a reproducible curl based gist with your current and
expected behavior, I could provide more details.

Best Regards,
Paul

On Friday, March 1, 2013 5:02:40 AM UTC-7, raj wrote:

hi,

i am using Elasticsearch 0.20.5 version.

I have mapped in synonym file as follow

abc,oracle america
xyz,abc
xyz,eloqua automation

and my analyzer mapping as following

{ \n"
+ " "index" : {\n"
+ ""analysis" : {\n"
+ ""analyzer" : {\n"
+ " "mainindexanalyzer" : {\n"
+ ""type":"custom",\n" //whitespace standard
+ ""tokenizer" : "standard",\n" //,"myshingle"
,"mystopword" "length", "length",["lowercase"
,"asciifolding","myworddelimiter","my_snowball"]
+ ""filter" :
["lowercase","asciifolding","length","mystopword","my_snowball","mystemmer","myshingle","mysynonym"],\n"
+ ""char_filter" :["html_strip"]\n"
+ " },\n"
+ ""mainsearchanalyzer" : {\n"
+ ""type":"custom",\n"
+ " "tokenizer" :
"whitespace",\n"//["lowercase","asciifolding","mystopword","mysynonym","myworddelimiter","my_snowball"]
+ ""filter" :
["lowercase","asciifolding","mystopword","mystemmer","myshingle","mysynonym","myworddelimiter"],\n"
+ ""char_filter" :["html_strip"]\n"
+ "}\n"
+ "},\n"
+ ""filter" : {\n"
+ ""mystemmer":{\n"
+ " "type" : "stemmer",\n"
+ ""name" :"english"\n"
+ "},\n"
+ ""my_snowball" : {\n"
+ ""type" : "snowball",\n"
+ ""language" : "English"\n"
+ "},\n"
+ ""mystopword": {\n"
+ " "type" : "stop",\n"
+ ""stopwords_path" :"" + stopwordfilepath + ""
,\n"
//+ ""stopwords_path"
:"F:/resources/stopwordeng.txt" ,\n"
+ ""ignore_case":true\n"
+ "},\n"
+ ""myshingle":{\n"
+ " "type" : "shingle",\n"
+ ""max_shingle_size" :100,\n"
+ ""min_shingle_size":2,\n"
+ ""output_unigrams":true \n"
+ "},\n"
+ ""mysynonym": {\n"
+ " "type" : "synonym",\n"
+ ""synonyms_path" :"" + synonymfilepaths_to_p +
"" ,\n"
+ ""ignore_case":true,\n"
+ ""expand":true\n"
+ "},\n"
+ ""myworddelimiter":{\n"
+ " "type" : "word_delimiter",\n"
+ ""generate_word_parts" :true ,\n"
+ ""generate_number_parts" :true ,\n"
+ ""catenate_words" :true ,\n"
+ ""catenate_numbers" :false ,\n"
+ ""catenate_all" :true ,\n"
+ ""split_on_case_change" :true ,\n"
+ ""preserve_original" :true ,\n"
+ ""split_on_numerics":true ,\n"
+ ""stem_english_possessive":true,\n"
+ ""protected_words_path " : "" +
protectedwordfilepath + "",\n"
+ ""type_table_path " : "" + typetablefilepath +
""\n"
+ "}\n"
+ "}\n"
+ "}\n"
+ "}\n"
+ "}";

and my index are following.
{id=4, Description=abc}
{id=5, Description=xyz}
{id=1, Description=Eloqua Automation}
{id=3, Description=Oracle america}

when I search using "oracle america" i get 'abc' and 'oracle
america
'
it's ok. but when searching using 'oracle' i got only 'oracle
america
'
it's also ok.
but I search using 'america' I got "oracle america" and "*abc" *
both.
why both value are come when using america it my doubt.what it wrong.

Please help.

Thanks
Rajesh

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

hi paul,

i want to use synonym "abc" to "oracle america" and "oracle america"
to "abc" only.
i don't want to match "oracle" or "america" to "abc" and reverse. how
to define it in synonym file? please explain.

I am using synonym in only in index time not in search time according to
you.

Thanks
Rajesh

On Tuesday, March 5, 2013 2:30:30 AM UTC+5:30, ppearcy wrote:

For a specific field's mappings, you have the ability to specify a
index_analyzer and a search_analyzer. For, index time synonym expansion,
you include the synonym token filter in the index analyzer and not the
search analyzer.

Keep in mind the search engine is term based. So, if your synonym list is:
abc, oracle america

Here is what happens to various data in the document:
abc -> abc oracle america
oracle america -> oracle america abc
oracle -> oracle
america -> america

Play around with index time synonym expansion and let us know if you have
more specific questions and recreations of examples.

Thanks,
Paul

On Monday, March 4, 2013 1:51:53 AM UTC-7, raj wrote:

hi Paul,

Thankyou for reply,
I am new for Elasticsearch.
please may you explain "synonym expansion at index time"
and if I using "directional synonyms" then I not get appropriated result.
For example
when I using this
oracle america => abc
and search using "oracle america" I got only "oracle america"
i can't got abc.

"abc" is available in index document.
please help.

Thankyou
Rajesh

On Saturday, March 2, 2013 5:04:31 AM UTC+5:30, ppearcy wrote:

I find the following approach effective if you are doing multi-word
synonyms (synonym phrases):

  • Only apply the synonym expansion at index time
  • Don't have the synonym filter applied search
  • Use directional synonyms where appropriate. You want to make sure that
    you're not injecting terms that are too general.

For example, you probably want:
oracle america => abc

Otherwise, a more general term "america" will get injected when you see
something specific.

If you provide a reproducible curl based gist with your current and
expected behavior, I could provide more details.

Best Regards,
Paul

On Friday, March 1, 2013 5:02:40 AM UTC-7, raj wrote:

hi,

i am using Elasticsearch 0.20.5 version.

I have mapped in synonym file as follow

abc,oracle america
xyz,abc
xyz,eloqua automation

and my analyzer mapping as following

{ \n"
+ " "index" : {\n"
+ ""analysis" : {\n"
+ ""analyzer" : {\n"
+ " "mainindexanalyzer" : {\n"
+ ""type":"custom",\n" //whitespace standard
+ ""tokenizer" : "standard",\n"
//,"myshingle" ,"mystopword" "length", "length",["lowercase"
,"asciifolding","myworddelimiter","my_snowball"]
+ ""filter" :
["lowercase","asciifolding","length","mystopword","my_snowball","mystemmer","myshingle","mysynonym"],\n"
+ ""char_filter" :["html_strip"]\n"
+ " },\n"
+ ""mainsearchanalyzer" : {\n"
+ ""type":"custom",\n"
+ " "tokenizer" :
"whitespace",\n"//["lowercase","asciifolding","mystopword","mysynonym","myworddelimiter","my_snowball"]
+ ""filter" :
["lowercase","asciifolding","mystopword","mystemmer","myshingle","mysynonym","myworddelimiter"],\n"
+ ""char_filter" :["html_strip"]\n"
+ "}\n"
+ "},\n"
+ ""filter" : {\n"
+ ""mystemmer":{\n"
+ " "type" : "stemmer",\n"
+ ""name" :"english"\n"
+ "},\n"
+ ""my_snowball" : {\n"
+ ""type" : "snowball",\n"
+ ""language" : "English"\n"
+ "},\n"
+ ""mystopword": {\n"
+ " "type" : "stop",\n"
+ ""stopwords_path" :"" + stopwordfilepath + ""
,\n"
//+ ""stopwords_path"
:"F:/resources/stopwordeng.txt" ,\n"
+ ""ignore_case":true\n"
+ "},\n"
+ ""myshingle":{\n"
+ " "type" : "shingle",\n"
+ ""max_shingle_size" :100,\n"
+ ""min_shingle_size":2,\n"
+ ""output_unigrams":true \n"
+ "},\n"
+ ""mysynonym": {\n"
+ " "type" : "synonym",\n"
+ ""synonyms_path" :"" + synonymfilepaths_to_p +
"" ,\n"
+ ""ignore_case":true,\n"
+ ""expand":true\n"
+ "},\n"
+ ""myworddelimiter":{\n"
+ " "type" : "word_delimiter",\n"
+ ""generate_word_parts" :true ,\n"
+ ""generate_number_parts" :true ,\n"
+ ""catenate_words" :true ,\n"
+ ""catenate_numbers" :false ,\n"
+ ""catenate_all" :true ,\n"
+ ""split_on_case_change" :true ,\n"
+ ""preserve_original" :true ,\n"
+ ""split_on_numerics":true ,\n"
+ ""stem_english_possessive":true,\n"
+ ""protected_words_path " : "" +
protectedwordfilepath + "",\n"
+ ""type_table_path " : "" + typetablefilepath +
""\n"
+ "}\n"
+ "}\n"
+ "}\n"
+ "}\n"
+ "}";

and my index are following.
{id=4, Description=abc}
{id=5, Description=xyz}
{id=1, Description=Eloqua Automation}
{id=3, Description=Oracle america}

when I search using "oracle america" i get 'abc' and 'oracle
america
'
it's ok. but when searching using 'oracle' i got only 'oracle
america
'
it's also ok.
but I search using 'america' I got "oracle america" and "*abc" *
both.
why both value are come when using america it my doubt.what it wrong.

Please help.

Thanks
Rajesh

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

AFAIK, this may not be possible if you are processing fields with other
text. The core issue is that oracle and america are two separate terms in
your index.

If you used a keyword tokenizer on your field and a keyword tokenizer on
your synonym file this would work with your example above, but only if your
field contains just "oracle america" or just "abc". If your field became
"Oracle America Company" it would not longer work with keyword analyzer.

The key thing to try to understand is that this is a term based search and
that ends up having some conflicts with synonym phrases. Perhaps you should
stick to single work synonyms?

Maybe others have further input, but I don't think there is much more I can
do to help. I recommend playing around with the various options and seeing
which tradeoffs are acceptable.

Thanks,
Paul

On Monday, March 4, 2013 10:07:28 PM UTC-7, raj wrote:

hi paul,

i want to use synonym "abc" to "oracle america" and "oracle america"
to "abc" only.
i don't want to match "oracle" or "america" to "abc" and reverse. how
to define it in synonym file? please explain.

I am using synonym in only in index time not in search time according to
you.

Thanks
Rajesh

On Tuesday, March 5, 2013 2:30:30 AM UTC+5:30, ppearcy wrote:

For a specific field's mappings, you have the ability to specify a
index_analyzer and a search_analyzer. For, index time synonym expansion,
you include the synonym token filter in the index analyzer and not the
search analyzer.

Keep in mind the search engine is term based. So, if your synonym list is:
abc, oracle america

Here is what happens to various data in the document:
abc -> abc oracle america
oracle america -> oracle america abc
oracle -> oracle
america -> america

Play around with index time synonym expansion and let us know if you have
more specific questions and recreations of examples.

Thanks,
Paul

On Monday, March 4, 2013 1:51:53 AM UTC-7, raj wrote:

hi Paul,

Thankyou for reply,
I am new for Elasticsearch.
please may you explain "synonym expansion at index time"
and if I using "directional synonyms" then I not get appropriated result.
For example
when I using this
oracle america => abc
and search using "oracle america" I got only "oracle america"
i can't got abc.

"abc" is available in index document.
please help.

Thankyou
Rajesh

On Saturday, March 2, 2013 5:04:31 AM UTC+5:30, ppearcy wrote:

I find the following approach effective if you are doing multi-word
synonyms (synonym phrases):

  • Only apply the synonym expansion at index time
  • Don't have the synonym filter applied search
  • Use directional synonyms where appropriate. You want to make sure
    that you're not injecting terms that are too general.

For example, you probably want:
oracle america => abc

Otherwise, a more general term "america" will get injected when you see
something specific.

If you provide a reproducible curl based gist with your current and
expected behavior, I could provide more details.

Best Regards,
Paul

On Friday, March 1, 2013 5:02:40 AM UTC-7, raj wrote:

hi,

i am using Elasticsearch 0.20.5 version.

I have mapped in synonym file as follow

abc,oracle america
xyz,abc
xyz,eloqua automation

and my analyzer mapping as following

{ \n"
+ " "index" : {\n"
+ ""analysis" : {\n"
+ ""analyzer" : {\n"
+ " "mainindexanalyzer" : {\n"
+ ""type":"custom",\n" //whitespace standard
+ ""tokenizer" : "standard",\n"
//,"myshingle" ,"mystopword" "length", "length",["lowercase"
,"asciifolding","myworddelimiter","my_snowball"]
+ ""filter" :
["lowercase","asciifolding","length","mystopword","my_snowball","mystemmer","myshingle","mysynonym"],\n"
+ ""char_filter" :["html_strip"]\n"
+ " },\n"
+ ""mainsearchanalyzer" : {\n"
+ ""type":"custom",\n"
+ " "tokenizer" :
"whitespace",\n"//["lowercase","asciifolding","mystopword","mysynonym","myworddelimiter","my_snowball"]
+ ""filter" :
["lowercase","asciifolding","mystopword","mystemmer","myshingle","mysynonym","myworddelimiter"],\n"
+ ""char_filter" :["html_strip"]\n"
+ "}\n"
+ "},\n"
+ ""filter" : {\n"
+ ""mystemmer":{\n"
+ " "type" : "stemmer",\n"
+ ""name" :"english"\n"
+ "},\n"
+ ""my_snowball" : {\n"
+ ""type" : "snowball",\n"
+ ""language" : "English"\n"
+ "},\n"
+ ""mystopword": {\n"
+ " "type" : "stop",\n"
+ ""stopwords_path" :"" + stopwordfilepath +
"" ,\n"
//+ ""stopwords_path"
:"F:/resources/stopwordeng.txt" ,\n"
+ ""ignore_case":true\n"
+ "},\n"
+ ""myshingle":{\n"
+ " "type" : "shingle",\n"
+ ""max_shingle_size" :100,\n"
+ ""min_shingle_size":2,\n"
+ ""output_unigrams":true \n"
+ "},\n"
+ ""mysynonym": {\n"
+ " "type" : "synonym",\n"
+ ""synonyms_path" :"" + synonymfilepaths_to_p

  • "" ,\n"
    + ""ignore_case":true,\n"
    + ""expand":true\n"
    + "},\n"
    + ""myworddelimiter":{\n"
    + " "type" : "word_delimiter",\n"
    + ""generate_word_parts" :true ,\n"
    + ""generate_number_parts" :true ,\n"
    + ""catenate_words" :true ,\n"
    + ""catenate_numbers" :false ,\n"
    + ""catenate_all" :true ,\n"
    + ""split_on_case_change" :true ,\n"
    + ""preserve_original" :true ,\n"
    + ""split_on_numerics":true ,\n"
    + ""stem_english_possessive":true,\n"
    + ""protected_words_path " : "" +
    protectedwordfilepath + "",\n"
    + ""type_table_path " : "" + typetablefilepath
  • ""\n"
    + "}\n"
    + "}\n"
    + "}\n"
    + "}\n"
    + "}";

and my index are following.
{id=4, Description=abc}
{id=5, Description=xyz}
{id=1, Description=Eloqua Automation}
{id=3, Description=Oracle america}

when I search using "oracle america" i get 'abc' and 'oracle
america
'
it's ok. but when searching using 'oracle' i got only 'oracle
america
'
it's also ok.
but I search using 'america' I got "oracle america" and "*abc" *
both.
why both value are come when using america it my doubt.what it wrong.

Please help.

Thanks
Rajesh

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.