Search field with special characters, reserved characters


#1

Client is having the elastic instance version 5.4.3 and we are restricted to the standard analyzer he is using.

We need to query such that we get exact match for ex: if data has following business_name:
Soup Unlimited
!Soup Unlimited
Soup *Unlimited
Soup Un+limited

then how can I only query for Soup *Unlimited. I tried Term, does not return anything, tried match returns more than expected results. Tried query_string, but it returns all of the strings having the special characters. It seems to be related to tokenizer. But do we have any work around in case I dont have direct control over Analyzer. Anything I can use at query level.

Sample data and query:

DELETE /inspectionsshort

PUT /inspectionsshort

PUT /inspectionsshort/_mapping/report
{
  "properties": {
    "business_name": {
  	"type": "text",
  	"fields": {
  	  "keyword": {
  		"type": "keyword",
  		"ignore_above": 256
  	  }
  	}
    }
  }
}

POST /inspectionsshort/report/_bulk
{ "index": { "_id": 1 }}
{"business_name":"San Francisco Soup Company"}
{ "index": { "_id": 2 }}
{"business_name":"Soup Unlimited"}
{ "index": { "_id": 3 }}
{"business_name":"TIO CHILOS GRILL"}
{ "index": { "_id": 4 }}
{"business_name":"San Francisco Restaurant"}
{ "index": { "_id": 5 }}
{"business_name":"Soup House"}
{ "index": { "_id": 6 }}
{"business_name":"Soup-or-Salad"}
{ "index": { "_id": 7 }}
{"business_name":"San +Francisco Soup Company"}
{ "index": { "_id": 8 }}
{"business_name":"!Soup Unlimited"}
{ "index": { "_id": 9 }}
{"business_name":"Soup *Unlimited"}
{ "index": { "_id": 10 }}
{"business_name":"Soup Unl*imited"}
{ "index": { "_id": 11 }}
{"business_name":"ASoup Unl*imited"}
{ "index": { "_id": 12 }}
{"business_name":"Soup BUnlimited"}

GET /inspectionsshort/report/_search
{ 
  "query": {
    "query_string": { 
               "query" : "Soup \\*Unlimited",
          "fields" : [
            "business_name"
          ]
      
    }
  }
}

Result I get is
all three of following

  {
    "_index": "inspectionsshort",
    "_type": "report",
    "_id": "8",
    "_score": 0,
    "_source": {
      "business_name": "!Soup Unlimited"
    }
  },
  {
    "_index": "inspectionsshort",
    "_type": "report",
    "_id": "9",
    "_score": 0,
    "_source": {
      "business_name": "Soup *Unlimited"
    }
  },
  {
    "_index": "inspectionsshort",
    "_type": "report",
    "_id": "2",
    "_score": 0,
    "_source": {
      "business_name": "Soup Unlimited"
    }
  }

Note, I am able to search exact match using term query if I search on business_name.keyword.
How, can I also search with wild card, example
GET /inspectionsshort/report/_search
{
"query": {
"query_string": {
"query": "Soup *Unlimited",
"fields": [
"business_name"
],
"use_dis_max": true,
"tie_breaker": 0,
"default_operator": "and",
"auto_generate_phrase_queries": false,
"max_determinized_states": 10000,
"enable_position_increments": true,
"fuzziness": "0",
"fuzzy_prefix_length": 0,
"fuzzy_max_expansions": 50,
"phrase_slop": 0,
"analyze_wildcard": true,
"escape": false,
"split_on_whitespace": true,
"boost": 1
}
}
}

I get words with special characters and normal character:
{
"_index": "inspectionsshort",
"_type": "report",
"_id": "2",
"_score": 1.4778225,
"_source": {
"business_name": "Soup Unlimited"
}
},
{
"_index": "inspectionsshort",
"_type": "report",
"_id": "8",
"_score": 1.0815521,
"_source": {
"business_name": "!Soup Unlimited"
}
},
{
"_index": "inspectionsshort",
"_type": "report",
"_id": "9",
"_score": 1.0815521,
"_source": {
"business_name": "Soup *Unlimited"
}
},
{
"_index": "inspectionsshort",
"_type": "report",
"_id": "12",
"_score": 1.0815521,
"_source": {
"business_name": "Soup BUnlimited"
}
}


(Christoph) #2

This works when you are searching using the "business_name.keyword" field. What is the problem with that?


(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.