Help with Synonyms


(Daniel Yim) #1

Hi everyone,

I am relatively new to elasticsearch and am having issues with getting my
synonym filter to work. Can you take a look at the settings and tell me
where I am going wrong?

I am expecting the search for "aids" to match the search results if I were
to search for "retrovirology", but this is not happening.

Thanks!

curl -XDELETE "http://localhost:9200/personsearch"

curl -XPUT "http://localhost:9200/personsearch" -d'
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"XYZSynAnalyzer": {
"tokenizer": "standard",
"filter": [
"XYZSynFilter"
]
}
},
"filter": {
"XYZSynFilter": {
"type": "synonym",
"synonyms": [
"aids, retrovirology"
]
}
}
}
}
},
"mappings": {
"xyzemployee": {
"_all": {
"analyzer": "XYZSynAnalyzer"
},
"properties": {
"firstName": {
"type": "string"
},
"lastName": {
"type": "string"
},
"middleName": {
"type": "string",
"include_in_all": false,
"index": "not_analyzed"
},
"specialty": {
"type": "string",
"analyzer": "XYZSynAnalyzer"
}
}
}
}
}'

curl -XPUT "http://localhost:9200/personsearch/xyzemployee/1" -d'
{
"firstName": "Don",
"middleName": "W.",
"lastName": "White",
"specialty": "Adult Retrovirology"
}'

curl -XPUT "http://localhost:9200/personsearch/xyzemployee/2" -d'
{
"firstName": "Terrance",
"middleName": "G.",
"lastName": "Gartner",
"specialty": "Retrovirology"
}'

curl -XPUT "http://localhost:9200/personsearch/xyzemployee/3" -d'
{
"firstName": "Carter",
"middleName": "L.",
"lastName": "Taylor",
"specialty": "Pediatric Retrovirology"
}'

curl -XGET
"http://localhost:9200/personsearch/xyzemployee/_search?pretty=true" -d'
{
"query": {
"match": {
"specialty": "retrovirology"
}
}
}'

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2e227a33-d935-4d22-89fb-57b59358c89d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Ivan Brusic) #2

Your issue is casing. You are only applying the synonym filter, which by
default does not lowercase terms. You can either set ignore_case to true
for the synonym filter or apply a lower case filter before the synonym. I
prefer to use the latter approach since I prefer to have all my analyzed
tokens lowercased.

Also, you should only apply the synonym filter at index time. You would
need to create two similar analyzers, one with the synonym filter and one
without. You can set the different ones via index_analyzer and
search_analyzer.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-core-types.html#string

Cheers,

Ivan

On Tue, Jul 22, 2014 at 9:33 AM, Daniel Yim danielyim@gmail.com wrote:

Hi everyone,

I am relatively new to elasticsearch and am having issues with getting my
synonym filter to work. Can you take a look at the settings and tell me
where I am going wrong?

I am expecting the search for "aids" to match the search results if I were
to search for "retrovirology", but this is not happening.

Thanks!

curl -XDELETE "http://localhost:9200/personsearch"

curl -XPUT "http://localhost:9200/personsearch" -d'
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"XYZSynAnalyzer": {
"tokenizer": "standard",
"filter": [
"XYZSynFilter"
]
}
},
"filter": {
"XYZSynFilter": {
"type": "synonym",
"synonyms": [
"aids, retrovirology"
]
}
}
}
}
},
"mappings": {
"xyzemployee": {
"_all": {
"analyzer": "XYZSynAnalyzer"
},
"properties": {
"firstName": {
"type": "string"
},
"lastName": {
"type": "string"
},
"middleName": {
"type": "string",
"include_in_all": false,
"index": "not_analyzed"
},
"specialty": {
"type": "string",
"analyzer": "XYZSynAnalyzer"
}
}
}
}
}'

curl -XPUT "http://localhost:9200/personsearch/xyzemployee/1" -d'
{
"firstName": "Don",
"middleName": "W.",
"lastName": "White",
"specialty": "Adult Retrovirology"
}'

curl -XPUT "http://localhost:9200/personsearch/xyzemployee/2" -d'
{
"firstName": "Terrance",
"middleName": "G.",
"lastName": "Gartner",
"specialty": "Retrovirology"
}'

curl -XPUT "http://localhost:9200/personsearch/xyzemployee/3" -d'
{
"firstName": "Carter",
"middleName": "L.",
"lastName": "Taylor",
"specialty": "Pediatric Retrovirology"
}'

curl -XGET "
http://localhost:9200/personsearch/xyzemployee/_search?pretty=true" -d'
{
"query": {
"match": {
"specialty": "retrovirology"
}
}
}'

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/2e227a33-d935-4d22-89fb-57b59358c89d%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/2e227a33-d935-4d22-89fb-57b59358c89d%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDxi6W5CdM3gzzPY70uVtUXAr4OK5odvz5fXvfpCO6LPw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Daniel Yim) #3

Thank you! That solved the initial issue.

Could you expand on why I would need two analyzers? I did what you asked,
but I am unsure of the reason behind it and would like to learn.

Here are my updated settings:

curl -XPUT "http://localhost:9200/personsearch" -d'
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"XYZSynAnalyzer": {
"tokenizer": "whitespace",
"filter": [
"lowercase",
"XYZSynFilter"
]
},
"MyAnalyzer": {
"tokenizer": "standard",
"filter": [
"standard",
"lowercase",
"stop"
]
}
},
"filter": {
"XYZSynFilter": {
"type": "synonym",
"synonyms": [
"aids, retrovirology"
]
}
}
}
}
},
"mappings": {
"xyzemployee": {
"_all": {
"analyzer": "XYZSynAnalyzer"
},
"properties": {
"firstName": {
"type": "string"
},
"lastName": {
"type": "string"
},
"middleName": {
"type": "string",
"include_in_all": false,
"index": "not_analyzed"
},
"specialty": {
"type": "string",
"index_analyzer": "XYZSynAnalyzer",
"search_analyzer": "MyAnalyzer"
}
}
}
}
}'

On Tuesday, July 22, 2014 11:56:40 AM UTC-5, Ivan Brusic wrote:

Your issue is casing. You are only applying the synonym filter, which by
default does not lowercase terms. You can either set ignore_case to true
for the synonym filter or apply a lower case filter before the synonym. I
prefer to use the latter approach since I prefer to have all my analyzed
tokens lowercased.

Also, you should only apply the synonym filter at index time. You would
need to create two similar analyzers, one with the synonym filter and one
without. You can set the different ones via index_analyzer and
search_analyzer.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-core-types.html#string

Cheers,

Ivan

On Tue, Jul 22, 2014 at 9:33 AM, Daniel Yim <dani...@gmail.com
<javascript:>> wrote:

Hi everyone,

I am relatively new to elasticsearch and am having issues with getting my
synonym filter to work. Can you take a look at the settings and tell me
where I am going wrong?

I am expecting the search for "aids" to match the search results if I
were to search for "retrovirology", but this is not happening.

Thanks!

curl -XDELETE "http://localhost:9200/personsearch"

curl -XPUT "http://localhost:9200/personsearch" -d'
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"XYZSynAnalyzer": {
"tokenizer": "standard",
"filter": [
"XYZSynFilter"
]
}
},
"filter": {
"XYZSynFilter": {
"type": "synonym",
"synonyms": [
"aids, retrovirology"
]
}
}
}
}
},
"mappings": {
"xyzemployee": {
"_all": {
"analyzer": "XYZSynAnalyzer"
},
"properties": {
"firstName": {
"type": "string"
},
"lastName": {
"type": "string"
},
"middleName": {
"type": "string",
"include_in_all": false,
"index": "not_analyzed"
},
"specialty": {
"type": "string",
"analyzer": "XYZSynAnalyzer"
}
}
}
}
}'

curl -XPUT "http://localhost:9200/personsearch/xyzemployee/1" -d'
{
"firstName": "Don",
"middleName": "W.",
"lastName": "White",
"specialty": "Adult Retrovirology"
}'

curl -XPUT "http://localhost:9200/personsearch/xyzemployee/2" -d'
{
"firstName": "Terrance",
"middleName": "G.",
"lastName": "Gartner",
"specialty": "Retrovirology"
}'

curl -XPUT "http://localhost:9200/personsearch/xyzemployee/3" -d'
{
"firstName": "Carter",
"middleName": "L.",
"lastName": "Taylor",
"specialty": "Pediatric Retrovirology"
}'

curl -XGET "
http://localhost:9200/personsearch/xyzemployee/_search?pretty=true" -d'
{
"query": {
"match": {
"specialty": "retrovirology"
}
}
}'

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/2e227a33-d935-4d22-89fb-57b59358c89d%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/2e227a33-d935-4d22-89fb-57b59358c89d%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c860ecf3-e4ae-4aad-8a44-e41166f7995e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Ivan Brusic) #4

A couple of reasons. The biggest issue is multi word synonyms since the
query parser will tokenize the query before analysis is applied. Also,
scoring could be affected and the results can be screwy. Here is a better
write up:

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory

--
Ivan

On Tue, Jul 22, 2014 at 10:47 AM, Daniel Yim danielyim@gmail.com wrote:

Thank you! That solved the initial issue.

Could you expand on why I would need two analyzers? I did what you asked,
but I am unsure of the reason behind it and would like to learn.

Here are my updated settings:

curl -XPUT "http://localhost:9200/personsearch" -d'
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"XYZSynAnalyzer": {
"tokenizer": "whitespace",
"filter": [
"lowercase",
"XYZSynFilter"
]
},
"MyAnalyzer": {
"tokenizer": "standard",
"filter": [
"standard",
"lowercase",
"stop"
]
}
},
"filter": {
"XYZSynFilter": {
"type": "synonym",
"synonyms": [
"aids, retrovirology"
]
}
}
}
}
},
"mappings": {
"xyzemployee": {
"_all": {
"analyzer": "XYZSynAnalyzer"
},
"properties": {
"firstName": {
"type": "string"
},
"lastName": {
"type": "string"
},
"middleName": {
"type": "string",
"include_in_all": false,
"index": "not_analyzed"
},
"specialty": {
"type": "string",
"index_analyzer": "XYZSynAnalyzer",
"search_analyzer": "MyAnalyzer"
}
}
}
}
}'

On Tuesday, July 22, 2014 11:56:40 AM UTC-5, Ivan Brusic wrote:

Your issue is casing. You are only applying the synonym filter, which by
default does not lowercase terms. You can either set ignore_case to true
for the synonym filter or apply a lower case filter before the synonym. I
prefer to use the latter approach since I prefer to have all my analyzed
tokens lowercased.

Also, you should only apply the synonym filter at index time. You would
need to create two similar analyzers, one with the synonym filter and one
without. You can set the different ones via index_analyzer and
search_analyzer.

http://www.elasticsearch.org/guide/en/elasticsearch/
reference/current/mapping-core-types.html#string

Cheers,

Ivan

On Tue, Jul 22, 2014 at 9:33 AM, Daniel Yim dani...@gmail.com wrote:

Hi everyone,

I am relatively new to elasticsearch and am having issues with getting
my synonym filter to work. Can you take a look at the settings and tell me
where I am going wrong?

I am expecting the search for "aids" to match the search results if I
were to search for "retrovirology", but this is not happening.

Thanks!

curl -XDELETE "http://localhost:9200/personsearch"

curl -XPUT "http://localhost:9200/personsearch" -d'
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"XYZSynAnalyzer": {
"tokenizer": "standard",
"filter": [
"XYZSynFilter"
]
}
},
"filter": {
"XYZSynFilter": {
"type": "synonym",
"synonyms": [
"aids, retrovirology"
]
}
}
}
}
},
"mappings": {
"xyzemployee": {
"_all": {
"analyzer": "XYZSynAnalyzer"
},
"properties": {
"firstName": {
"type": "string"
},
"lastName": {
"type": "string"
},
"middleName": {
"type": "string",
"include_in_all": false,
"index": "not_analyzed"
},
"specialty": {
"type": "string",
"analyzer": "XYZSynAnalyzer"
}
}
}
}
}'

curl -XPUT "http://localhost:9200/personsearch/xyzemployee/1" -d'
{
"firstName": "Don",
"middleName": "W.",
"lastName": "White",
"specialty": "Adult Retrovirology"
}'

curl -XPUT "http://localhost:9200/personsearch/xyzemployee/2" -d'
{
"firstName": "Terrance",
"middleName": "G.",
"lastName": "Gartner",
"specialty": "Retrovirology"
}'

curl -XPUT "http://localhost:9200/personsearch/xyzemployee/3" -d'
{
"firstName": "Carter",
"middleName": "L.",
"lastName": "Taylor",
"specialty": "Pediatric Retrovirology"
}'

curl -XGET "http://localhost:9200/personsearch/xyzemployee/_
search?pretty=true" -d'
{
"query": {
"match": {
"specialty": "retrovirology"
}
}
}'

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/2e227a33-d935-4d22-89fb-57b59358c89d%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/2e227a33-d935-4d22-89fb-57b59358c89d%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/c860ecf3-e4ae-4aad-8a44-e41166f7995e%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/c860ecf3-e4ae-4aad-8a44-e41166f7995e%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQBROoQJ_QY-6wv849oJSSH0krTQK%2B%3D%2BP9vs_Ps7svjRTA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Ivan Brusic) #5

I appreciate the fact that you want to know why you shouldn't use synonyms
at query time. I couldn't find the following articles during my last
response (I read them a while back and I have waaaaay too many bookmarks),
but I finally found them:


--
Ivan

On Tue, Jul 22, 2014 at 11:03 AM, Ivan Brusic ivan@brusic.com wrote:

A couple of reasons. The biggest issue is multi word synonyms since the
query parser will tokenize the query before analysis is applied. Also,
scoring could be affected and the results can be screwy. Here is a better
write up:

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory

--
Ivan

On Tue, Jul 22, 2014 at 10:47 AM, Daniel Yim danielyim@gmail.com wrote:

Thank you! That solved the initial issue.

Could you expand on why I would need two analyzers? I did what you asked,
but I am unsure of the reason behind it and would like to learn.

Here are my updated settings:

curl -XPUT "http://localhost:9200/personsearch" -d'
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"XYZSynAnalyzer": {
"tokenizer": "whitespace",
"filter": [
"lowercase",
"XYZSynFilter"
]
},
"MyAnalyzer": {
"tokenizer": "standard",
"filter": [
"standard",
"lowercase",
"stop"
]
}
},
"filter": {
"XYZSynFilter": {
"type": "synonym",
"synonyms": [
"aids, retrovirology"
]
}
}
}
}
},
"mappings": {
"xyzemployee": {
"_all": {
"analyzer": "XYZSynAnalyzer"
},
"properties": {
"firstName": {
"type": "string"
},
"lastName": {
"type": "string"
},
"middleName": {
"type": "string",
"include_in_all": false,
"index": "not_analyzed"
},
"specialty": {
"type": "string",
"index_analyzer": "XYZSynAnalyzer",
"search_analyzer": "MyAnalyzer"
}
}
}
}
}'

On Tuesday, July 22, 2014 11:56:40 AM UTC-5, Ivan Brusic wrote:

Your issue is casing. You are only applying the synonym filter, which by
default does not lowercase terms. You can either set ignore_case to true
for the synonym filter or apply a lower case filter before the synonym. I
prefer to use the latter approach since I prefer to have all my analyzed
tokens lowercased.

Also, you should only apply the synonym filter at index time. You would
need to create two similar analyzers, one with the synonym filter and one
without. You can set the different ones via index_analyzer and
search_analyzer.

http://www.elasticsearch.org/guide/en/elasticsearch/
reference/current/mapping-core-types.html#string

Cheers,

Ivan

On Tue, Jul 22, 2014 at 9:33 AM, Daniel Yim dani...@gmail.com wrote:

Hi everyone,

I am relatively new to elasticsearch and am having issues with getting
my synonym filter to work. Can you take a look at the settings and tell me
where I am going wrong?

I am expecting the search for "aids" to match the search results if I
were to search for "retrovirology", but this is not happening.

Thanks!

curl -XDELETE "http://localhost:9200/personsearch"

curl -XPUT "http://localhost:9200/personsearch" -d'
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"XYZSynAnalyzer": {
"tokenizer": "standard",
"filter": [
"XYZSynFilter"
]
}
},
"filter": {
"XYZSynFilter": {
"type": "synonym",
"synonyms": [
"aids, retrovirology"
]
}
}
}
}
},
"mappings": {
"xyzemployee": {
"_all": {
"analyzer": "XYZSynAnalyzer"
},
"properties": {
"firstName": {
"type": "string"
},
"lastName": {
"type": "string"
},
"middleName": {
"type": "string",
"include_in_all": false,
"index": "not_analyzed"
},
"specialty": {
"type": "string",
"analyzer": "XYZSynAnalyzer"
}
}
}
}
}'

curl -XPUT "http://localhost:9200/personsearch/xyzemployee/1" -d'
{
"firstName": "Don",
"middleName": "W.",
"lastName": "White",
"specialty": "Adult Retrovirology"
}'

curl -XPUT "http://localhost:9200/personsearch/xyzemployee/2" -d'
{
"firstName": "Terrance",
"middleName": "G.",
"lastName": "Gartner",
"specialty": "Retrovirology"
}'

curl -XPUT "http://localhost:9200/personsearch/xyzemployee/3" -d'
{
"firstName": "Carter",
"middleName": "L.",
"lastName": "Taylor",
"specialty": "Pediatric Retrovirology"
}'

curl -XGET "http://localhost:9200/personsearch/xyzemployee/_
search?pretty=true" -d'
{
"query": {
"match": {
"specialty": "retrovirology"
}
}
}'

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/2e227a33-d935-4d22-89fb-57b59358c89d%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/2e227a33-d935-4d22-89fb-57b59358c89d%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/c860ecf3-e4ae-4aad-8a44-e41166f7995e%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/c860ecf3-e4ae-4aad-8a44-e41166f7995e%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCp2%2BQVRBsTDG-3KoJeSsvqZcg%3DxEeCq%3DE1PEETp4GmLw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Daniel Yim) #6

Ivan, thank you feeding my curiosity! The first one really gave me an
"a-ha!" moment when I saw the images of the synonym matching as directed
graphs. It put some insight as to why my multi-token synonyms were being
expanded a certain way.

On Tuesday, July 22, 2014 4:37:45 PM UTC-5, Ivan Brusic wrote:

I appreciate the fact that you want to know why you shouldn't use synonyms
at query time. I couldn't find the following articles during my last
response (I read them a while back and I have waaaaay too many bookmarks),
but I finally found them:

http://blog.mikemccandless.com/2012/04/lucenes-tokenstreams-are-actually.html
http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/

--
Ivan

On Tue, Jul 22, 2014 at 11:03 AM, Ivan Brusic <iv...@brusic.com
<javascript:>> wrote:

A couple of reasons. The biggest issue is multi word synonyms since the
query parser will tokenize the query before analysis is applied. Also,
scoring could be affected and the results can be screwy. Here is a better
write up:

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory

--
Ivan

On Tue, Jul 22, 2014 at 10:47 AM, Daniel Yim <dani...@gmail.com
<javascript:>> wrote:

Thank you! That solved the initial issue.

Could you expand on why I would need two analyzers? I did what you
asked, but I am unsure of the reason behind it and would like to learn.

Here are my updated settings:

curl -XPUT "http://localhost:9200/personsearch" -d'
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"XYZSynAnalyzer": {
"tokenizer": "whitespace",
"filter": [
"lowercase",
"XYZSynFilter"
]
},
"MyAnalyzer": {
"tokenizer": "standard",
"filter": [
"standard",
"lowercase",
"stop"
]
}
},
"filter": {
"XYZSynFilter": {
"type": "synonym",
"synonyms": [
"aids, retrovirology"
]
}
}
}
}
},
"mappings": {
"xyzemployee": {
"_all": {
"analyzer": "XYZSynAnalyzer"
},
"properties": {
"firstName": {
"type": "string"
},
"lastName": {
"type": "string"
},
"middleName": {
"type": "string",
"include_in_all": false,
"index": "not_analyzed"
},
"specialty": {
"type": "string",
"index_analyzer": "XYZSynAnalyzer",
"search_analyzer": "MyAnalyzer"
}
}
}
}
}'

On Tuesday, July 22, 2014 11:56:40 AM UTC-5, Ivan Brusic wrote:

Your issue is casing. You are only applying the synonym filter, which
by default does not lowercase terms. You can either set ignore_case to true
for the synonym filter or apply a lower case filter before the synonym. I
prefer to use the latter approach since I prefer to have all my analyzed
tokens lowercased.

Also, you should only apply the synonym filter at index time. You would
need to create two similar analyzers, one with the synonym filter and one
without. You can set the different ones via index_analyzer and
search_analyzer.

http://www.elasticsearch.org/guide/en/elasticsearch/
reference/current/mapping-core-types.html#string

Cheers,

Ivan

On Tue, Jul 22, 2014 at 9:33 AM, Daniel Yim dani...@gmail.com wrote:

Hi everyone,

I am relatively new to elasticsearch and am having issues with getting
my synonym filter to work. Can you take a look at the settings and tell me
where I am going wrong?

I am expecting the search for "aids" to match the search results if I
were to search for "retrovirology", but this is not happening.

Thanks!

curl -XDELETE "http://localhost:9200/personsearch"

curl -XPUT "http://localhost:9200/personsearch" -d'
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"XYZSynAnalyzer": {
"tokenizer": "standard",
"filter": [
"XYZSynFilter"
]
}
},
"filter": {
"XYZSynFilter": {
"type": "synonym",
"synonyms": [
"aids, retrovirology"
]
}
}
}
}
},
"mappings": {
"xyzemployee": {
"_all": {
"analyzer": "XYZSynAnalyzer"
},
"properties": {
"firstName": {
"type": "string"
},
"lastName": {
"type": "string"
},
"middleName": {
"type": "string",
"include_in_all": false,
"index": "not_analyzed"
},
"specialty": {
"type": "string",
"analyzer": "XYZSynAnalyzer"
}
}
}
}
}'

curl -XPUT "http://localhost:9200/personsearch/xyzemployee/1" -d'
{
"firstName": "Don",
"middleName": "W.",
"lastName": "White",
"specialty": "Adult Retrovirology"
}'

curl -XPUT "http://localhost:9200/personsearch/xyzemployee/2" -d'
{
"firstName": "Terrance",
"middleName": "G.",
"lastName": "Gartner",
"specialty": "Retrovirology"
}'

curl -XPUT "http://localhost:9200/personsearch/xyzemployee/3" -d'
{
"firstName": "Carter",
"middleName": "L.",
"lastName": "Taylor",
"specialty": "Pediatric Retrovirology"
}'

curl -XGET "http://localhost:9200/personsearch/xyzemployee/_
search?pretty=true" -d'
{
"query": {
"match": {
"specialty": "retrovirology"
}
}
}'

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/2e227a33-d935-4d22-89fb-57b59358c89d%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/2e227a33-d935-4d22-89fb-57b59358c89d%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/c860ecf3-e4ae-4aad-8a44-e41166f7995e%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/c860ecf3-e4ae-4aad-8a44-e41166f7995e%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/cafbbaf9-c39e-4f6d-ad6c-e367d54bf8fa%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #7