Querying synonyms


(elastic) #1

I'm trying to retrieve all documents using synonyms:

PUT /my_index
{
"settings": {
"analysis": {
"filter": {
"my_synonym_filter": {
"type": "synonym",
"synonyms": [
"british,english",
"queen,monarch monarch"
]
}
},
"analyzer": {
"my_synonyms": {
"tokenizer": "standard",
"filter": [
"lowercase",
"my_synonym_filter"
]
}
}
}
}
}

PUT http://localhost:9200/my_index/_bulk
{"index":{"_index":"my_index","_type":"test"}}
{"name":"queen"}
{"index":{"_index":"my_index","_type":"test"}}
{"name":"monarch monarch"}

POST http://localhost:9200/my_index/_search?size=500
{
"query":{
"query_string":{
"query":"monarch monarch",
"analyzer":"my_synonyms"
}
}

}
Analyzer result for "monarch monarch"

{
"tokens": [
{
"token": "monarch",
"start_offset": 0,
"end_offset": 7,
"type": "",
"position": 0
},
{
"token": "queen",
"start_offset": 0,
"end_offset": 15,
"type": "SYNONYM",
"position": 0
},
{
"token": "monarch",
"start_offset": 8,
"end_offset": 15,
"type": "",
"position": 1
}
]
}
These are curl commands from Sense. When querying "monarch monarch", it's only returning the "monarch monarch" document. I was hoping to get the document containing "queen" too since they are defined as synonyms.


(Abdon Pijpelink) #2

By default, the query_string query splits the query on whitespace before analysis. That means, the analyzer does not receive "monarch monarch" as input (which would result in a query for the synonym "queen" also), but two separate inputs "monarch" and "monarch" (which do not have a synonym defined individually).

There are two solutions. Set split_on_whitespace to false:

POST my_index/_search?size=500
{
  "query": {
    "query_string": {
      "query": "monarch monarch",
      "analyzer": "my_synonyms",
      "split_on_whitespace": false
    }
  }
}

Or use a match query instead:

POST my_index/_search?size=500
{
  "query": {
    "match": {
      "name": {
        "query": "monarch monarch",
        "analyzer": "my_synonyms"
      }
    }
  }
}

(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.