Synonym multi words search

Rajesh_Tarle · March 1, 2013, 12:02pm

hi,

i am using elastic search 0.20.5 version.

I have mapped in synonym file as follow

abc,oracle america
xyz,abc
xyz,eloqua automation

and my analyzer mapping as following

{ \n"
+ " "index" : {\n"
+ ""analysis" : {\n"
+ ""analyzer" : {\n"
+ " "mainindexanalyzer" : {\n"
+ ""type":"custom",\n" //whitespace standard
+ ""tokenizer" : "standard",\n" //,"myshingle"
,"mystopword" "length", "length",["lowercase"
,"asciifolding","myworddelimiter","my_snowball"]
+ ""filter" :
["lowercase","asciifolding","length","mystopword","my_snowball","mystemmer","myshingle","mysynonym"],\n"
+ ""char_filter" :["html_strip"]\n"
+ " },\n"
+ ""mainsearchanalyzer" : {\n"
+ ""type":"custom",\n"
+ " "tokenizer" :
"whitespace",\n"//["lowercase","asciifolding","mystopword","mysynonym","myworddelimiter","my_snowball"]
+ ""filter" :
["lowercase","asciifolding","mystopword","mystemmer","myshingle","mysynonym","myworddelimiter"],\n"
+ ""char_filter" :["html_strip"]\n"
+ "}\n"
+ "},\n"
+ ""filter" : {\n"
+ ""mystemmer":{\n"
+ " "type" : "stemmer",\n"
+ ""name" :"english"\n"
+ "},\n"
+ ""my_snowball" : {\n"
+ ""type" : "snowball",\n"
+ ""language" : "English"\n"
+ "},\n"
+ ""mystopword": {\n"
+ " "type" : "stop",\n"
+ ""stopwords_path" :"" + stopwordfilepath + "" ,\n"
//+ ""stopwords_path"
:"F:/resources/stopwordeng.txt" ,\n"
+ ""ignore_case":true\n"
+ "},\n"
+ ""myshingle":{\n"
+ " "type" : "shingle",\n"
+ ""max_shingle_size" :100,\n"
+ ""min_shingle_size":2,\n"
+ ""output_unigrams":true \n"
+ "},\n"
+ ""mysynonym": {\n"
+ " "type" : "synonym",\n"
+ ""synonyms_path" :"" + synonymfilepaths_to_p + ""
,\n"
+ ""ignore_case":true,\n"
+ ""expand":true\n"
+ "},\n"
+ ""myworddelimiter":{\n"
+ " "type" : "word_delimiter",\n"
+ ""generate_word_parts" :true ,\n"
+ ""generate_number_parts" :true ,\n"
+ ""catenate_words" :true ,\n"
+ ""catenate_numbers" :false ,\n"
+ ""catenate_all" :true ,\n"
+ ""split_on_case_change" :true ,\n"
+ ""preserve_original" :true ,\n"
+ ""split_on_numerics":true ,\n"
+ ""stem_english_possessive":true,\n"
+ ""protected_words_path " : "" +
protectedwordfilepath + "",\n"
+ ""type_table_path " : "" + typetablefilepath +
""\n"
+ "}\n"
+ "}\n"
+ "}\n"
+ "}\n"
+ "}";

and my index are following.
{id=4, Description=abc}
{id=5, Description=xyz}
{id=1, Description=Eloqua Automation}
{id=3, Description=Oracle america}

when I search using "oracle america" i get 'abc' and 'oracle america'
it's ok. but when searching using 'oracle' i got only 'oracle america'
it's also ok.
but I search using 'america' I got "oracle america" and "*abc" *both.
why both value are come when using america it my doubt.what it wrong.

Please help.

Thanks
Rajesh

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

ppearcy · March 1, 2013, 11:34pm

I find the following approach effective if you are doing multi-word
synonyms (synonym phrases):

Only apply the synonym expansion at index time
Don't have the synonym filter applied search
Use directional synonyms where appropriate. You want to make sure that
you're not injecting terms that are too general.

For example, you probably want:
oracle america => abc

Otherwise, a more general term "america" will get injected when you see
something specific.

If you provide a reproducible curl based gist with your current and
expected behavior, I could provide more details.

Best Regards,
Paul

On Friday, March 1, 2013 5:02:40 AM UTC-7, raj wrote:

hi,

i am using Elasticsearch 0.20.5 version.

I have mapped in synonym file as follow

abc,oracle america
xyz,abc
xyz,eloqua automation

and my analyzer mapping as following

{ \n"
+ " "index" : {\n"
+ ""analysis" : {\n"
+ ""analyzer" : {\n"
+ " "mainindexanalyzer" : {\n"
+ ""type":"custom",\n" //whitespace standard
+ ""tokenizer" : "standard",\n" //,"myshingle"
,"mystopword" "length", "length",["lowercase"
,"asciifolding","myworddelimiter","my_snowball"]
+ ""filter" :
["lowercase","asciifolding","length","mystopword","my_snowball","mystemmer","myshingle","mysynonym"],\n"
+ ""char_filter" :["html_strip"]\n"
+ " },\n"
+ ""mainsearchanalyzer" : {\n"
+ ""type":"custom",\n"
+ " "tokenizer" :
"whitespace",\n"//["lowercase","asciifolding","mystopword","mysynonym","myworddelimiter","my_snowball"]
+ ""filter" :
["lowercase","asciifolding","mystopword","mystemmer","myshingle","mysynonym","myworddelimiter"],\n"
+ ""char_filter" :["html_strip"]\n"
+ "}\n"
+ "},\n"
+ ""filter" : {\n"
+ ""mystemmer":{\n"
+ " "type" : "stemmer",\n"
+ ""name" :"english"\n"
+ "},\n"
+ ""my_snowball" : {\n"
+ ""type" : "snowball",\n"
+ ""language" : "English"\n"
+ "},\n"
+ ""mystopword": {\n"
+ " "type" : "stop",\n"
+ ""stopwords_path" :"" + stopwordfilepath + ""
,\n"
//+ ""stopwords_path"
:"F:/resources/stopwordeng.txt" ,\n"
+ ""ignore_case":true\n"
+ "},\n"
+ ""myshingle":{\n"
+ " "type" : "shingle",\n"
+ ""max_shingle_size" :100,\n"
+ ""min_shingle_size":2,\n"
+ ""output_unigrams":true \n"
+ "},\n"
+ ""mysynonym": {\n"
+ " "type" : "synonym",\n"
+ ""synonyms_path" :"" + synonymfilepaths_to_p +
"" ,\n"
+ ""ignore_case":true,\n"
+ ""expand":true\n"
+ "},\n"
+ ""myworddelimiter":{\n"
+ " "type" : "word_delimiter",\n"
+ ""generate_word_parts" :true ,\n"
+ ""generate_number_parts" :true ,\n"
+ ""catenate_words" :true ,\n"
+ ""catenate_numbers" :false ,\n"
+ ""catenate_all" :true ,\n"
+ ""split_on_case_change" :true ,\n"
+ ""preserve_original" :true ,\n"
+ ""split_on_numerics":true ,\n"
+ ""stem_english_possessive":true,\n"
+ ""protected_words_path " : "" +
protectedwordfilepath + "",\n"
+ ""type_table_path " : "" + typetablefilepath +
""\n"
+ "}\n"
+ "}\n"
+ "}\n"
+ "}\n"
+ "}";

and my index are following.
{id=4, Description=abc}
{id=5, Description=xyz}
{id=1, Description=Eloqua Automation}
{id=3, Description=Oracle america}

when I search using "oracle america" i get 'abc' and 'oracle america
'
it's ok. but when searching using 'oracle' i got only 'oracle america
'
it's also ok.
but I search using 'america' I got "oracle america" and "*abc" *both.
why both value are come when using america it my doubt.what it wrong.

Please help.

Thanks
Rajesh

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Rajesh_Tarle · March 4, 2013, 8:43am

On Friday, March 1, 2013 5:32:40 PM UTC+5:30, raj wrote:

hi,

i am using Elasticsearch 0.20.5 version.

I have mapped in synonym file as follow

abc,oracle america
xyz,abc
xyz,eloqua automation

and my analyzer mapping as following

{ \n"
+ " "index" : {\n"
+ ""analysis" : {\n"
+ ""analyzer" : {\n"
+ " "mainindexanalyzer" : {\n"
+ ""type":"custom",\n" //whitespace standard
+ ""tokenizer" : "standard",\n" //,"myshingle"
,"mystopword" "length", "length",["lowercase"
,"asciifolding","myworddelimiter","my_snowball"]
+ ""filter" :
["lowercase","asciifolding","length","mystopword","my_snowball","mystemmer","myshingle","mysynonym"],\n"
+ ""char_filter" :["html_strip"]\n"
+ " },\n"
+ ""mainsearchanalyzer" : {\n"
+ ""type":"custom",\n"
+ " "tokenizer" :
"whitespace",\n"//["lowercase","asciifolding","mystopword","mysynonym","myworddelimiter","my_snowball"]
+ ""filter" :
["lowercase","asciifolding","mystopword","mystemmer","myshingle","mysynonym","myworddelimiter"],\n"
+ ""char_filter" :["html_strip"]\n"
+ "}\n"
+ "},\n"
+ ""filter" : {\n"
+ ""mystemmer":{\n"
+ " "type" : "stemmer",\n"
+ ""name" :"english"\n"
+ "},\n"
+ ""my_snowball" : {\n"
+ ""type" : "snowball",\n"
+ ""language" : "English"\n"
+ "},\n"
+ ""mystopword": {\n"
+ " "type" : "stop",\n"
+ ""stopwords_path" :"" + stopwordfilepath + ""
,\n"
//+ ""stopwords_path"
:"F:/resources/stopwordeng.txt" ,\n"
+ ""ignore_case":true\n"
+ "},\n"
+ ""myshingle":{\n"
+ " "type" : "shingle",\n"
+ ""max_shingle_size" :100,\n"
+ ""min_shingle_size":2,\n"
+ ""output_unigrams":true \n"
+ "},\n"
+ ""mysynonym": {\n"
+ " "type" : "synonym",\n"
+ ""synonyms_path" :"" + synonymfilepaths_to_p +
"" ,\n"
+ ""ignore_case":true,\n"
+ ""expand":true\n"
+ "},\n"
+ ""myworddelimiter":{\n"
+ " "type" : "word_delimiter",\n"
+ ""generate_word_parts" :true ,\n"
+ ""generate_number_parts" :true ,\n"
+ ""catenate_words" :true ,\n"
+ ""catenate_numbers" :false ,\n"
+ ""catenate_all" :true ,\n"
+ ""split_on_case_change" :true ,\n"
+ ""preserve_original" :true ,\n"
+ ""split_on_numerics":true ,\n"
+ ""stem_english_possessive":true,\n"
+ ""protected_words_path " : "" +
protectedwordfilepath + "",\n"
+ ""type_table_path " : "" + typetablefilepath +
""\n"
+ "}\n"
+ "}\n"
+ "}\n"
+ "}\n"
+ "}";

and my index are following.
{id=4, Description=abc}
{id=5, Description=xyz}
{id=1, Description=Eloqua Automation}
{id=3, Description=Oracle america}

when I search using "oracle america" i get 'abc' and 'oracle america
'
it's ok. but when searching using 'oracle' i got only 'oracle america
'
it's also ok.
but I search using 'america' I got "oracle america" and "*abc" *both.
why both value are come when using america it my doubt.what it wrong.

Please help.

Thanks
Rajesh

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Rajesh_Tarle · March 4, 2013, 8:51am

hi Paul,

Thankyou for reply,
I am new for Elasticsearch.
please may you explain "synonym expansion at index time"
and if I using "directional synonyms" then I not get appropriated result.
For example
when I using this
oracle america => abc
and search using "oracle america" I got only "oracle america"
i can't got abc.

"abc" is available in index document.
please help.

Thankyou
Rajesh

On Saturday, March 2, 2013 5:04:31 AM UTC+5:30, ppearcy wrote:

I find the following approach effective if you are doing multi-word
synonyms (synonym phrases):

Only apply the synonym expansion at index time

Don't have the synonym filter applied search

Use directional synonyms where appropriate. You want to make sure that
you're not injecting terms that are too general.

For example, you probably want:
oracle america => abc

Otherwise, a more general term "america" will get injected when you see
something specific.

If you provide a reproducible curl based gist with your current and
expected behavior, I could provide more details.

Best Regards,
Paul

On Friday, March 1, 2013 5:02:40 AM UTC-7, raj wrote:

hi,

i am using Elasticsearch 0.20.5 version.

I have mapped in synonym file as follow

abc,oracle america
xyz,abc
xyz,eloqua automation

and my analyzer mapping as following

{ \n"
+ " "index" : {\n"
+ ""analysis" : {\n"
+ ""analyzer" : {\n"
+ " "mainindexanalyzer" : {\n"
+ ""type":"custom",\n" //whitespace standard
+ ""tokenizer" : "standard",\n" //,"myshingle"
,"mystopword" "length", "length",["lowercase"
,"asciifolding","myworddelimiter","my_snowball"]
+ ""filter" :
["lowercase","asciifolding","length","mystopword","my_snowball","mystemmer","myshingle","mysynonym"],\n"
+ ""char_filter" :["html_strip"]\n"
+ " },\n"
+ ""mainsearchanalyzer" : {\n"
+ ""type":"custom",\n"
+ " "tokenizer" :
"whitespace",\n"//["lowercase","asciifolding","mystopword","mysynonym","myworddelimiter","my_snowball"]
+ ""filter" :
["lowercase","asciifolding","mystopword","mystemmer","myshingle","mysynonym","myworddelimiter"],\n"
+ ""char_filter" :["html_strip"]\n"
+ "}\n"
+ "},\n"
+ ""filter" : {\n"
+ ""mystemmer":{\n"
+ " "type" : "stemmer",\n"
+ ""name" :"english"\n"
+ "},\n"
+ ""my_snowball" : {\n"
+ ""type" : "snowball",\n"
+ ""language" : "English"\n"
+ "},\n"
+ ""mystopword": {\n"
+ " "type" : "stop",\n"
+ ""stopwords_path" :"" + stopwordfilepath + ""
,\n"
//+ ""stopwords_path"
:"F:/resources/stopwordeng.txt" ,\n"
+ ""ignore_case":true\n"
+ "},\n"
+ ""myshingle":{\n"
+ " "type" : "shingle",\n"
+ ""max_shingle_size" :100,\n"
+ ""min_shingle_size":2,\n"
+ ""output_unigrams":true \n"
+ "},\n"
+ ""mysynonym": {\n"
+ " "type" : "synonym",\n"
+ ""synonyms_path" :"" + synonymfilepaths_to_p +
"" ,\n"
+ ""ignore_case":true,\n"
+ ""expand":true\n"
+ "},\n"
+ ""myworddelimiter":{\n"
+ " "type" : "word_delimiter",\n"
+ ""generate_word_parts" :true ,\n"
+ ""generate_number_parts" :true ,\n"
+ ""catenate_words" :true ,\n"
+ ""catenate_numbers" :false ,\n"
+ ""catenate_all" :true ,\n"
+ ""split_on_case_change" :true ,\n"
+ ""preserve_original" :true ,\n"
+ ""split_on_numerics":true ,\n"
+ ""stem_english_possessive":true,\n"
+ ""protected_words_path " : "" +
protectedwordfilepath + "",\n"
+ ""type_table_path " : "" + typetablefilepath +
""\n"
+ "}\n"
+ "}\n"
+ "}\n"
+ "}\n"
+ "}";

and my index are following.
{id=4, Description=abc}
{id=5, Description=xyz}
{id=1, Description=Eloqua Automation}
{id=3, Description=Oracle america}

when I search using "oracle america" i get 'abc' and '*oracle america
*'
it's ok. but when searching using 'oracle' i got only '*oracle america
*'
it's also ok.
but I search using 'america' I got "oracle america" and "*abc" *both.
why both value are come when using america it my doubt.what it wrong.

Please help.

Thanks
Rajesh

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

ppearcy · March 4, 2013, 9:00pm

For a specific field's mappings, you have the ability to specify a
index_analyzer and a search_analyzer. For, index time synonym expansion,
you include the synonym token filter in the index analyzer and not the
search analyzer.

Keep in mind the search engine is term based. So, if your synonym list is:
abc, oracle america

Here is what happens to various data in the document:
abc -> abc oracle america
oracle america -> oracle america abc
oracle -> oracle
america -> america

Play around with index time synonym expansion and let us know if you have
more specific questions and recreations of examples.

Thanks,
Paul

On Monday, March 4, 2013 1:51:53 AM UTC-7, raj wrote:

hi Paul,

Thankyou for reply,
I am new for Elasticsearch.
please may you explain "synonym expansion at index time"
and if I using "directional synonyms" then I not get appropriated result.
For example
when I using this
oracle america => abc
and search using "oracle america" I got only "oracle america"
i can't got abc.

"abc" is available in index document.
please help.

Thankyou
Rajesh

On Saturday, March 2, 2013 5:04:31 AM UTC+5:30, ppearcy wrote:

I find the following approach effective if you are doing multi-word
synonyms (synonym phrases):

Only apply the synonym expansion at index time

Don't have the synonym filter applied search

Use directional synonyms where appropriate. You want to make sure that
you're not injecting terms that are too general.

For example, you probably want:
oracle america => abc

Otherwise, a more general term "america" will get injected when you see
something specific.

If you provide a reproducible curl based gist with your current and
expected behavior, I could provide more details.

Best Regards,
Paul

On Friday, March 1, 2013 5:02:40 AM UTC-7, raj wrote:

hi,

i am using Elasticsearch 0.20.5 version.

I have mapped in synonym file as follow

abc,oracle america
xyz,abc
xyz,eloqua automation

and my analyzer mapping as following

{ \n"
+ " "index" : {\n"
+ ""analysis" : {\n"
+ ""analyzer" : {\n"
+ " "mainindexanalyzer" : {\n"
+ ""type":"custom",\n" //whitespace standard
+ ""tokenizer" : "standard",\n" //,"myshingle"
,"mystopword" "length", "length",["lowercase"
,"asciifolding","myworddelimiter","my_snowball"]
+ ""filter" :
["lowercase","asciifolding","length","mystopword","my_snowball","mystemmer","myshingle","mysynonym"],\n"
+ ""char_filter" :["html_strip"]\n"
+ " },\n"
+ ""mainsearchanalyzer" : {\n"
+ ""type":"custom",\n"
+ " "tokenizer" :
"whitespace",\n"//["lowercase","asciifolding","mystopword","mysynonym","myworddelimiter","my_snowball"]
+ ""filter" :
["lowercase","asciifolding","mystopword","mystemmer","myshingle","mysynonym","myworddelimiter"],\n"
+ ""char_filter" :["html_strip"]\n"
+ "}\n"
+ "},\n"
+ ""filter" : {\n"
+ ""mystemmer":{\n"
+ " "type" : "stemmer",\n"
+ ""name" :"english"\n"
+ "},\n"
+ ""my_snowball" : {\n"
+ ""type" : "snowball",\n"
+ ""language" : "English"\n"
+ "},\n"
+ ""mystopword": {\n"
+ " "type" : "stop",\n"
+ ""stopwords_path" :"" + stopwordfilepath + ""
,\n"
//+ ""stopwords_path"
:"F:/resources/stopwordeng.txt" ,\n"
+ ""ignore_case":true\n"
+ "},\n"
+ ""myshingle":{\n"
+ " "type" : "shingle",\n"
+ ""max_shingle_size" :100,\n"
+ ""min_shingle_size":2,\n"
+ ""output_unigrams":true \n"
+ "},\n"
+ ""mysynonym": {\n"
+ " "type" : "synonym",\n"
+ ""synonyms_path" :"" + synonymfilepaths_to_p +
"" ,\n"
+ ""ignore_case":true,\n"
+ ""expand":true\n"
+ "},\n"
+ ""myworddelimiter":{\n"
+ " "type" : "word_delimiter",\n"
+ ""generate_word_parts" :true ,\n"
+ ""generate_number_parts" :true ,\n"
+ ""catenate_words" :true ,\n"
+ ""catenate_numbers" :false ,\n"
+ ""catenate_all" :true ,\n"
+ ""split_on_case_change" :true ,\n"
+ ""preserve_original" :true ,\n"
+ ""split_on_numerics":true ,\n"
+ ""stem_english_possessive":true,\n"
+ ""protected_words_path " : "" +
protectedwordfilepath + "",\n"
+ ""type_table_path " : "" + typetablefilepath +
""\n"
+ "}\n"
+ "}\n"
+ "}\n"
+ "}\n"
+ "}";

and my index are following.
{id=4, Description=abc}
{id=5, Description=xyz}
{id=1, Description=Eloqua Automation}
{id=3, Description=Oracle america}

when I search using "oracle america" i get 'abc' and 'oracle
america'
it's ok. but when searching using 'oracle' i got only 'oracle
america'
it's also ok.
but I search using 'america' I got "oracle america" and "*abc" *
both.
why both value are come when using america it my doubt.what it wrong.

Please help.

Thanks
Rajesh

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Rajesh_Tarle · March 5, 2013, 5:07am

hi paul,

i want to use synonym "abc" to "oracle america" and "oracle america"
to "abc" only.
i don't want to match "oracle" or "america" to "abc" and reverse. how
to define it in synonym file? please explain.
I am using synonym in only in index time not in search time according to
you.

Thanks
Rajesh

On Tuesday, March 5, 2013 2:30:30 AM UTC+5:30, ppearcy wrote:

For a specific field's mappings, you have the ability to specify a
index_analyzer and a search_analyzer. For, index time synonym expansion,
you include the synonym token filter in the index analyzer and not the
search analyzer.

Keep in mind the search engine is term based. So, if your synonym list is:
abc, oracle america

Here is what happens to various data in the document:
abc -> abc oracle america
oracle america -> oracle america abc
oracle -> oracle
america -> america

Play around with index time synonym expansion and let us know if you have
more specific questions and recreations of examples.

Thanks,
Paul

On Monday, March 4, 2013 1:51:53 AM UTC-7, raj wrote:

hi Paul,

Thankyou for reply,
I am new for Elasticsearch.
please may you explain "synonym expansion at index time"
and if I using "directional synonyms" then I not get appropriated result.
For example
when I using this
oracle america => abc
and search using "oracle america" I got only "oracle america"
i can't got abc.

"abc" is available in index document.
please help.

Thankyou
Rajesh

On Saturday, March 2, 2013 5:04:31 AM UTC+5:30, ppearcy wrote:

I find the following approach effective if you are doing multi-word
synonyms (synonym phrases):

Only apply the synonym expansion at index time

Don't have the synonym filter applied search

Use directional synonyms where appropriate. You want to make sure that
you're not injecting terms that are too general.

For example, you probably want:
oracle america => abc

Otherwise, a more general term "america" will get injected when you see
something specific.

If you provide a reproducible curl based gist with your current and
expected behavior, I could provide more details.

Best Regards,
Paul

On Friday, March 1, 2013 5:02:40 AM UTC-7, raj wrote:

hi,

i am using Elasticsearch 0.20.5 version.

I have mapped in synonym file as follow

abc,oracle america
xyz,abc
xyz,eloqua automation

and my analyzer mapping as following

{ \n"
+ " "index" : {\n"
+ ""analysis" : {\n"
+ ""analyzer" : {\n"
+ " "mainindexanalyzer" : {\n"
+ ""type":"custom",\n" //whitespace standard
+ ""tokenizer" : "standard",\n"
//,"myshingle" ,"mystopword" "length", "length",["lowercase"
,"asciifolding","myworddelimiter","my_snowball"]
+ ""filter" :
["lowercase","asciifolding","length","mystopword","my_snowball","mystemmer","myshingle","mysynonym"],\n"
+ ""char_filter" :["html_strip"]\n"
+ " },\n"
+ ""mainsearchanalyzer" : {\n"
+ ""type":"custom",\n"
+ " "tokenizer" :
"whitespace",\n"//["lowercase","asciifolding","mystopword","mysynonym","myworddelimiter","my_snowball"]
+ ""filter" :
["lowercase","asciifolding","mystopword","mystemmer","myshingle","mysynonym","myworddelimiter"],\n"
+ ""char_filter" :["html_strip"]\n"
+ "}\n"
+ "},\n"
+ ""filter" : {\n"
+ ""mystemmer":{\n"
+ " "type" : "stemmer",\n"
+ ""name" :"english"\n"
+ "},\n"
+ ""my_snowball" : {\n"
+ ""type" : "snowball",\n"
+ ""language" : "English"\n"
+ "},\n"
+ ""mystopword": {\n"
+ " "type" : "stop",\n"
+ ""stopwords_path" :"" + stopwordfilepath + ""
,\n"
//+ ""stopwords_path"
:"F:/resources/stopwordeng.txt" ,\n"
+ ""ignore_case":true\n"
+ "},\n"
+ ""myshingle":{\n"
+ " "type" : "shingle",\n"
+ ""max_shingle_size" :100,\n"
+ ""min_shingle_size":2,\n"
+ ""output_unigrams":true \n"
+ "},\n"
+ ""mysynonym": {\n"
+ " "type" : "synonym",\n"
+ ""synonyms_path" :"" + synonymfilepaths_to_p +
"" ,\n"
+ ""ignore_case":true,\n"
+ ""expand":true\n"
+ "},\n"
+ ""myworddelimiter":{\n"
+ " "type" : "word_delimiter",\n"
+ ""generate_word_parts" :true ,\n"
+ ""generate_number_parts" :true ,\n"
+ ""catenate_words" :true ,\n"
+ ""catenate_numbers" :false ,\n"
+ ""catenate_all" :true ,\n"
+ ""split_on_case_change" :true ,\n"
+ ""preserve_original" :true ,\n"
+ ""split_on_numerics":true ,\n"
+ ""stem_english_possessive":true,\n"
+ ""protected_words_path " : "" +
protectedwordfilepath + "",\n"
+ ""type_table_path " : "" + typetablefilepath +
""\n"
+ "}\n"
+ "}\n"
+ "}\n"
+ "}\n"
+ "}";

and my index are following.
{id=4, Description=abc}
{id=5, Description=xyz}
{id=1, Description=Eloqua Automation}
{id=3, Description=Oracle america}

when I search using "oracle america" i get 'abc' and 'oracle
america'
it's ok. but when searching using 'oracle' i got only 'oracle
america'
it's also ok.
but I search using 'america' I got "oracle america" and "*abc" *
both.
why both value are come when using america it my doubt.what it wrong.

Please help.

Thanks
Rajesh

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

ppearcy · March 5, 2013, 4:45pm

AFAIK, this may not be possible if you are processing fields with other
text. The core issue is that oracle and america are two separate terms in
your index.

If you used a keyword tokenizer on your field and a keyword tokenizer on
your synonym file this would work with your example above, but only if your
field contains just "oracle america" or just "abc". If your field became
"Oracle America Company" it would not longer work with keyword analyzer.

The key thing to try to understand is that this is a term based search and
that ends up having some conflicts with synonym phrases. Perhaps you should
stick to single work synonyms?

Maybe others have further input, but I don't think there is much more I can
do to help. I recommend playing around with the various options and seeing
which tradeoffs are acceptable.

Thanks,
Paul

On Monday, March 4, 2013 10:07:28 PM UTC-7, raj wrote:

hi paul,

i want to use synonym "abc" to "oracle america" and "oracle america"
to "abc" only.
i don't want to match "oracle" or "america" to "abc" and reverse. how
to define it in synonym file? please explain.
I am using synonym in only in index time not in search time according to
you.

Thanks
Rajesh

On Tuesday, March 5, 2013 2:30:30 AM UTC+5:30, ppearcy wrote:

For a specific field's mappings, you have the ability to specify a
index_analyzer and a search_analyzer. For, index time synonym expansion,
you include the synonym token filter in the index analyzer and not the
search analyzer.

Keep in mind the search engine is term based. So, if your synonym list is:
abc, oracle america

Here is what happens to various data in the document:
abc -> abc oracle america
oracle america -> oracle america abc
oracle -> oracle
america -> america

Play around with index time synonym expansion and let us know if you have
more specific questions and recreations of examples.

Thanks,
Paul

On Monday, March 4, 2013 1:51:53 AM UTC-7, raj wrote:

hi Paul,

Thankyou for reply,
I am new for Elasticsearch.
please may you explain "synonym expansion at index time"
and if I using "directional synonyms" then I not get appropriated result.
For example
when I using this
oracle america => abc
and search using "oracle america" I got only "oracle america"
i can't got abc.

"abc" is available in index document.
please help.

Thankyou
Rajesh

On Saturday, March 2, 2013 5:04:31 AM UTC+5:30, ppearcy wrote:

I find the following approach effective if you are doing multi-word
synonyms (synonym phrases):

Only apply the synonym expansion at index time

Don't have the synonym filter applied search

Use directional synonyms where appropriate. You want to make sure
that you're not injecting terms that are too general.

For example, you probably want:
oracle america => abc

Otherwise, a more general term "america" will get injected when you see
something specific.

If you provide a reproducible curl based gist with your current and
expected behavior, I could provide more details.

Best Regards,
Paul

On Friday, March 1, 2013 5:02:40 AM UTC-7, raj wrote:

hi,

i am using Elasticsearch 0.20.5 version.

I have mapped in synonym file as follow

abc,oracle america
xyz,abc
xyz,eloqua automation

and my analyzer mapping as following

{ \n"
+ " "index" : {\n"
+ ""analysis" : {\n"
+ ""analyzer" : {\n"
+ " "mainindexanalyzer" : {\n"
+ ""type":"custom",\n" //whitespace standard
+ ""tokenizer" : "standard",\n"
//,"myshingle" ,"mystopword" "length", "length",["lowercase"
,"asciifolding","myworddelimiter","my_snowball"]
+ ""filter" :
["lowercase","asciifolding","length","mystopword","my_snowball","mystemmer","myshingle","mysynonym"],\n"
+ ""char_filter" :["html_strip"]\n"
+ " },\n"
+ ""mainsearchanalyzer" : {\n"
+ ""type":"custom",\n"
+ " "tokenizer" :
"whitespace",\n"//["lowercase","asciifolding","mystopword","mysynonym","myworddelimiter","my_snowball"]
+ ""filter" :
["lowercase","asciifolding","mystopword","mystemmer","myshingle","mysynonym","myworddelimiter"],\n"
+ ""char_filter" :["html_strip"]\n"
+ "}\n"
+ "},\n"
+ ""filter" : {\n"
+ ""mystemmer":{\n"
+ " "type" : "stemmer",\n"
+ ""name" :"english"\n"
+ "},\n"
+ ""my_snowball" : {\n"
+ ""type" : "snowball",\n"
+ ""language" : "English"\n"
+ "},\n"
+ ""mystopword": {\n"
+ " "type" : "stop",\n"
+ ""stopwords_path" :"" + stopwordfilepath +
"" ,\n"
//+ ""stopwords_path"
:"F:/resources/stopwordeng.txt" ,\n"
+ ""ignore_case":true\n"
+ "},\n"
+ ""myshingle":{\n"
+ " "type" : "shingle",\n"
+ ""max_shingle_size" :100,\n"
+ ""min_shingle_size":2,\n"
+ ""output_unigrams":true \n"
+ "},\n"
+ ""mysynonym": {\n"
+ " "type" : "synonym",\n"
+ ""synonyms_path" :"" + synonymfilepaths_to_p

"" ,\n"
+ ""ignore_case":true,\n"
+ ""expand":true\n"
+ "},\n"
+ ""myworddelimiter":{\n"
+ " "type" : "word_delimiter",\n"
+ ""generate_word_parts" :true ,\n"
+ ""generate_number_parts" :true ,\n"
+ ""catenate_words" :true ,\n"
+ ""catenate_numbers" :false ,\n"
+ ""catenate_all" :true ,\n"
+ ""split_on_case_change" :true ,\n"
+ ""preserve_original" :true ,\n"
+ ""split_on_numerics":true ,\n"
+ ""stem_english_possessive":true,\n"
+ ""protected_words_path " : "" +
protectedwordfilepath + "",\n"
+ ""type_table_path " : "" + typetablefilepath

""\n"
+ "}\n"
+ "}\n"
+ "}\n"
+ "}\n"
+ "}";

and my index are following.
{id=4, Description=abc}
{id=5, Description=xyz}
{id=1, Description=Eloqua Automation}
{id=3, Description=Oracle america}

when I search using "oracle america" i get 'abc' and 'oracle
america'
it's ok. but when searching using 'oracle' i got only 'oracle
america'
it's also ok.
but I search using 'america' I got "oracle america" and "*abc" *
both.
why both value are come when using america it my doubt.what it wrong.

Please help.

Thanks
Rajesh

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Elasticsearch Multi-word synonyms and stopword issue Elasticsearch	1	238	November 24, 2022
Multi-term synonyms: How can this be used in practice? Elasticsearch	6	2985	April 8, 2020
Match query with synonym token filter where synonym available on first & second paragraph only Elasticsearch	3	588	June 26, 2020
How to search a contraction word? Elasticsearch	1	597	September 20, 2018
Problem with synonym token filter Elasticsearch	8	460	July 6, 2017

Synonym multi words search

Related topics