I don't know if the following behavior is intended or not. See the following example of index definition and documents:
PUT /test
{
"settings": {
"analysis": {
"filter": {
"syn": {
"synonyms": ["ysl, yves saint laurent"],
"type": "synonym_graph"
}
},
"analyzer": {
"index": {
"type": "custom",
"tokenizer": "standard",
"filter": [ "lowercase", "asciifolding", "trim"]
},
"query": {
"type": "custom",
"tokenizer": "standard",
"filter": [ "lowercase", "asciifolding", "trim", "syn"]
}
}
}
},
"mappings": {
"properties": {
"field1": { "type": "text", "analyzer": "index", "search_analyzer": "query" }
}
}
}
POST test/_doc/1
{
"field1": "saint nicolas"
}
POST test/_doc/2
{
"field1": "new ysl bag"
}
POST test/_doc/3
{
"field1": "new yves saint laurent shoes"
}
When I run query
GET test/_search
{
"query": {
"match": {
"field1": {
"query": "yves saint laurent",
"operator": "or"
}
}
}
}
I get back documents 2 and 3. Not 1. Why not 1? Since I specified operator OR, should not presence of the token saint be enough to return document 1?
If I run query
GET test/_search
{
"query": {
"match": {
"field1": {
"query": "yves saint",
"operator": "or"
}
}
}
}
I do get back documents 1 and 3, which is expected.
Just for info, when I run
GET test/_analyze
{
"analyzer": "query",
"text": "yves saint laurent"
}
I get back
{
"tokens" : [
{
"token" : "ysl",
"start_offset" : 0,
"end_offset" : 18,
"type" : "SYNONYM",
"position" : 0,
"positionLength" : 3
},
{
"token" : "yves",
"start_offset" : 0,
"end_offset" : 4,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "saint",
"start_offset" : 5,
"end_offset" : 10,
"type" : "<ALPHANUM>",
"position" : 1
},
{
"token" : "laurent",
"start_offset" : 11,
"end_offset" : 18,
"type" : "<ALPHANUM>",
"position" : 2
}
]
}
So I can see there token saint being present, that makes me unsure whether behavior described above is expected or not.
I am using Elasticsearch 7.10