Client is having the elastic instance version 5.4.3 and we are restricted to the standard analyzer he is using.
We need to query such that we get exact match for ex: if data has following business_name:
Soup Unlimited
!Soup Unlimited
Soup *Unlimited
Soup Un+limited
then how can I only query for Soup *Unlimited. I tried Term, does not return anything, tried match returns more than expected results. Tried query_string, but it returns all of the strings having the special characters. It seems to be related to tokenizer. But do we have any work around in case I dont have direct control over Analyzer. Anything I can use at query level.
Sample data and query:
DELETE /inspectionsshort
PUT /inspectionsshort
PUT /inspectionsshort/_mapping/report
{
"properties": {
"business_name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
POST /inspectionsshort/report/_bulk
{ "index": { "_id": 1 }}
{"business_name":"San Francisco Soup Company"}
{ "index": { "_id": 2 }}
{"business_name":"Soup Unlimited"}
{ "index": { "_id": 3 }}
{"business_name":"TIO CHILOS GRILL"}
{ "index": { "_id": 4 }}
{"business_name":"San Francisco Restaurant"}
{ "index": { "_id": 5 }}
{"business_name":"Soup House"}
{ "index": { "_id": 6 }}
{"business_name":"Soup-or-Salad"}
{ "index": { "_id": 7 }}
{"business_name":"San +Francisco Soup Company"}
{ "index": { "_id": 8 }}
{"business_name":"!Soup Unlimited"}
{ "index": { "_id": 9 }}
{"business_name":"Soup *Unlimited"}
{ "index": { "_id": 10 }}
{"business_name":"Soup Unl*imited"}
{ "index": { "_id": 11 }}
{"business_name":"ASoup Unl*imited"}
{ "index": { "_id": 12 }}
{"business_name":"Soup BUnlimited"}
GET /inspectionsshort/report/_search
{
"query": {
"query_string": {
"query" : "Soup \\*Unlimited",
"fields" : [
"business_name"
]
}
}
}
Result I get is
all three of following
{
"_index": "inspectionsshort",
"_type": "report",
"_id": "8",
"_score": 0,
"_source": {
"business_name": "!Soup Unlimited"
}
},
{
"_index": "inspectionsshort",
"_type": "report",
"_id": "9",
"_score": 0,
"_source": {
"business_name": "Soup *Unlimited"
}
},
{
"_index": "inspectionsshort",
"_type": "report",
"_id": "2",
"_score": 0,
"_source": {
"business_name": "Soup Unlimited"
}
}
Note, I am able to search exact match using term query if I search on business_name.keyword.
How, can I also search with wild card, example
GET /inspectionsshort/report/_search
{
"query": {
"query_string": {
"query": "Soup *Unlimited",
"fields": [
"business_name"
],
"use_dis_max": true,
"tie_breaker": 0,
"default_operator": "and",
"auto_generate_phrase_queries": false,
"max_determinized_states": 10000,
"enable_position_increments": true,
"fuzziness": "0",
"fuzzy_prefix_length": 0,
"fuzzy_max_expansions": 50,
"phrase_slop": 0,
"analyze_wildcard": true,
"escape": false,
"split_on_whitespace": true,
"boost": 1
}
}
}
I get words with special characters and normal character:
{
"_index": "inspectionsshort",
"_type": "report",
"_id": "2",
"_score": 1.4778225,
"_source": {
"business_name": "Soup Unlimited"
}
},
{
"_index": "inspectionsshort",
"_type": "report",
"_id": "8",
"_score": 1.0815521,
"_source": {
"business_name": "!Soup Unlimited"
}
},
{
"_index": "inspectionsshort",
"_type": "report",
"_id": "9",
"_score": 1.0815521,
"_source": {
"business_name": "Soup *Unlimited"
}
},
{
"_index": "inspectionsshort",
"_type": "report",
"_id": "12",
"_score": 1.0815521,
"_source": {
"business_name": "Soup BUnlimited"
}
}