Elastic search synonym match involving numeric characters


(Siva Shanmuga Subramanian Murugan) #1

I have documents indexed in elastic cluster with the below mapping.
basically i have a field named model which holds car model names like
"Silverado 2500HD", "Silverado 1500HD" "LX 350" etc etc.

POST /location-test-no-boost {
"settings":{
"analysis":{
"analyzer":{
"mysynonym":{
"tokenizer":"standard",
"filter":[
"standard","lowercase","stop","mysynonym"
],
"ignore_case":true
}
},
"filter":{
"mysynonym":{
"type":"synonym",
"synonyms": [
"2500 HD=>2500HD",
"chevy silverado=>Silverado"
]
}
}
}
},
"mappings":{
"vehicles":{
"properties":{
"id":{
"type":"long",
"ignore_malformed":true
},
"model":{
"type":"String",
"index_analyzer": "standard",
"search_analyzer":"mysynonym"
}
}
}
}
}

The sample document content is

POST /location-test-no-boost/vehicles/10
{
"model" : "Silverado 2500HD"
}

When i tried to search with Query string "Chevy sivlerado", the synonym
matches perfectly to Silverado and gives back the result, on the contrary
when i tried to search via query string "2500 HD" it is returning 0
results. I tried different combination on the Synonym involving number and
found that elastic search synonym mapper does not support numbers is this
correct?

is there any way i can make some mapping when user searches for "2500 HD",
i can map the query to "2500HD"

--
Please update your bookmarks! We have moved to https://discuss.elastic.co/

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d6e2183b-fa43-491d-a441-9c442926f492%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(昕玫) #2

may be you should debug like this:

POST /location-test-no-boost/vehicles?pretty&analyzer=mysynonym
"2500 HD"

and it will return the analyze result. You can compare to:

POST /location-test-no-boost/vehicles?pretty&analyzer=standard
"Silverado 2500HD"

So you may know witch place has problem.

在 2015年5月28日星期四 UTC+8上午11:20:51,Siva Shanmuga Subramanian Murugan写道:

I have documents indexed in elastic cluster with the below mapping.
basically i have a field named model which holds car model names like
"Silverado 2500HD", "Silverado 1500HD" "LX 350" etc etc.

POST /location-test-no-boost {
"settings":{
"analysis":{
"analyzer":{
"mysynonym":{
"tokenizer":"standard",
"filter":[
"standard","lowercase","stop","mysynonym"
],
"ignore_case":true
}
},
"filter":{
"mysynonym":{
"type":"synonym",
"synonyms": [
"2500 HD=>2500HD",
"chevy silverado=>Silverado"
]
}
}
}
},
"mappings":{
"vehicles":{
"properties":{
"id":{
"type":"long",
"ignore_malformed":true
},
"model":{
"type":"String",
"index_analyzer": "standard",
"search_analyzer":"mysynonym"
}
}
}
}
}

The sample document content is

POST /location-test-no-boost/vehicles/10
{
"model" : "Silverado 2500HD"
}

When i tried to search with Query string "Chevy sivlerado", the synonym
matches perfectly to Silverado and gives back the result, on the contrary
when i tried to search via query string "2500 HD" it is returning 0
results. I tried different combination on the Synonym involving number and
found that elastic search synonym mapper does not support numbers is this
correct?

is there any way i can make some mapping when user searches for "2500 HD",
i can map the query to "2500HD"

--
Please update your bookmarks! We have moved to https://discuss.elastic.co/

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4981bbf5-86a7-4777-959c-c0b264deb900%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(昕玫) #3

I think your filters' order may swap like this:

"analysis": {
"analyzer": {
"mysynonym": {
"tokenizer": "standard",
"filter": [
"mysynonym","standard","lowercase", "stop"
],
"ignore_case": true
}
}

That's because filters work like assembly line, if your first filter is
standard, your "2500 HD" will split to "2500" and "HD" before go through
other filters. So your mysynonym do not work.

//before swap filters' order$ curl -POST
"http://myES/my_test/_analyze?pretty&analyzer=mysynonym" -d "2500 HD"{
"tokens" : [ { "token" : "2500", "start_offset" : 0, "end_offset"
: 4, "type" : "", "position" : 1 }, { "token" : "hd",
"start_offset" : 5, "end_offset" : 7, "type" : "",
"position" : 2 } ]}//after$ curl -POST
"http://myES/my_test/_analyze?pretty&analyzer=mysynonym" -d "2500 HD"{
"tokens" : [ { "token" : "2500hd", "start_offset" : 0,
"end_offset" : 7, "type" : "SYNONYM", "position" : 1 } ]}

Sincerely hope this may helpful to you.

在 2015年5月28日星期四 UTC+8上午11:20:51,Siva Shanmuga Subramanian Murugan写道:

I have documents indexed in elastic cluster with the below mapping.
basically i have a field named model which holds car model names like
"Silverado 2500HD", "Silverado 1500HD" "LX 350" etc etc.

POST /location-test-no-boost {
"settings":{
"analysis":{
"analyzer":{
"mysynonym":{
"tokenizer":"standard",
"filter":[
"standard","lowercase","stop","mysynonym"
],
"ignore_case":true
}
},
"filter":{
"mysynonym":{
"type":"synonym",
"synonyms": [
"2500 HD=>2500HD",
"chevy silverado=>Silverado"
]
}
}
}
},
"mappings":{
"vehicles":{
"properties":{
"id":{
"type":"long",
"ignore_malformed":true
},
"model":{
"type":"String",
"index_analyzer": "standard",
"search_analyzer":"mysynonym"
}
}
}
}
}

The sample document content is

POST /location-test-no-boost/vehicles/10
{
"model" : "Silverado 2500HD"
}

When i tried to search with Query string "Chevy sivlerado", the synonym
matches perfectly to Silverado and gives back the result, on the contrary
when i tried to search via query string "2500 HD" it is returning 0
results. I tried different combination on the Synonym involving number and
found that elastic search synonym mapper does not support numbers is this
correct?

is there any way i can make some mapping when user searches for "2500 HD",
i can map the query to "2500HD"

--
Please update your bookmarks! We have moved to https://discuss.elastic.co/

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/803ed44d-2c8f-43fd-9cab-470e9a262b00%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Siva Shanmuga Subramanian Murugan) #4

Woow this fixed the issue. Thanks a lot for your help

Regards
Siva

On Thursday, May 28, 2015 at 1:00:04 AM UTC-7, xinm...@163.com wrote:

I think your filters' order may swap like this:

"analysis": {
"analyzer": {
"mysynonym": {
"tokenizer": "standard",
"filter": [
"mysynonym","standard","lowercase", "stop"
],
"ignore_case": true
}
}

That's because filters work like assembly line, if your first filter is
standard, your "2500 HD" will split to "2500" and "HD" before go through
other filters. So your mysynonym do not work.

//before swap filters' order$ curl -POST
"http://myES/my_test/_analyze?pretty&analyzer=mysynonym
http://myES/my_test/_analyze?pretty&analyzer=mysynonym" -d "2500 HD"{
"tokens" : [ { "token" : "2500", "start_offset" : 0, "end_offset"
: 4, "type" : "", "position" : 1 }, { "token" : "hd",
"start_offset" : 5, "end_offset" : 7, "type" : "",
"position" : 2 } ]}//after$ curl -POST
"http://myES/my_test/_analyze?pretty&analyzer=mysynonym
http://myES/my_test/_analyze?pretty&analyzer=mysynonym" -d "2500 HD"{
"tokens" : [ { "token" : "2500hd", "start_offset" : 0,
"end_offset" : 7, "type" : "SYNONYM", "position" : 1 } ]}

Sincerely hope this may helpful to you.

在 2015年5月28日星期四 UTC+8上午11:20:51,Siva Shanmuga Subramanian Murugan写道:

I have documents indexed in elastic cluster with the below mapping.
basically i have a field named model which holds car model names like
"Silverado 2500HD", "Silverado 1500HD" "LX 350" etc etc.

POST /location-test-no-boost {
"settings":{
"analysis":{
"analyzer":{
"mysynonym":{
"tokenizer":"standard",
"filter":[
"standard","lowercase","stop","mysynonym"
],
"ignore_case":true
}
},
"filter":{
"mysynonym":{
"type":"synonym",
"synonyms": [
"2500 HD=>2500HD",
"chevy silverado=>Silverado"
]
}
}
}
},
"mappings":{
"vehicles":{
"properties":{
"id":{
"type":"long",
"ignore_malformed":true
},
"model":{
"type":"String",
"index_analyzer": "standard",
"search_analyzer":"mysynonym"
}
}
}
}
}

The sample document content is

POST /location-test-no-boost/vehicles/10
{
"model" : "Silverado 2500HD"
}

When i tried to search with Query string "Chevy sivlerado", the synonym
matches perfectly to Silverado and gives back the result, on the contrary
when i tried to search via query string "2500 HD" it is returning 0
results. I tried different combination on the Synonym involving number and
found that elastic search synonym mapper does not support numbers is this
correct?

is there any way i can make some mapping when user searches for "2500
HD", i can map the query to "2500HD"

--
Please update your bookmarks! We have moved to https://discuss.elastic.co/

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/792b252c-7a46-49fd-9266-5b811d00b103%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #5