Hello @S-Dragon0302
As per the table :
POST /fruits/_doc
{
"message": ["usa", "apple"]
}
POST /fruits/_doc
{
"message": ["usa", "apple", "banana"]
}
POST /fruits/_doc
{
"message": ["usa", "apple", "apple care"]
}
POST /fruits/_doc
{
"message": ["usa", "apple care"]
}
POST /fruits/_doc
{
"message": ["usa", "apple", "banana", "apple care"]
}
#ambigous word = None
GET /fruits/_search
{
"query": {
"bool": {
"must": [
{ "match": { "message": "usa" }},
{ "match": { "message": "apple" }}
],
"must_not": [
{ "match": { "message": "banana" }}
]
}
}
}
#ambigous word = apple care
GET /fruits/_search
{
"query": {
"bool": {
"must": [
{ "match": { "message": "usa" }},
{ "match": { "message": "apple" }}
],
"must_not": [
{ "match": { "message": "banana" }}
],
"filter": {
"script": {
"script": {
"source": """
def kws = doc['message.keyword'].size() == 0 ? [] : doc['message.keyword'];
return !kws.contains('apple care') || kws.contains('apple');
""",
"lang": "painless"
}
}
}
}
}
}
Thanks!!
Using scripts can achieve this, but the performance is too poor. I have hundreds of billions of data entries, and the storage is at the PB level.
Maybe of interest: in this demo users can choose which interpretation of the ambiguous search term “ice” they want and search for only that. This uses clustering and binary vectors so may require changes to both indexing and user behaviours but does solve this sort of problem.