Wildcard searches

Currently i use below wildcard search for my service,

{
  "query": {
    "bool": {
      "must": [
        {
          "wildcard": {
            "PRODUCT_DESCRIPTION": "\*collaboration\*services\*shiriyara\*"
          }
        }
      ]
    }
  }
}

This returns me expected result. But i am looking for alternative ways to achieve this without using wildcard query, as wildcard takes more time.

I tried "query_string" on a "standard" analyzed field. But this returns result if whole word matches.

  "query_string": {
    "default_field": "PRODUCT_DESCRIPTION",
    "default_operator": "AND",
    "query": "collaboration services shiriyara"
  }

If the string is "collab services shiriyara", it won't give any result, whereas wildcard gives.

Let me know, if anybody has thoughts. Index time changes also fine with me.

It'd be good to Ngram based analyzer. See https://www.elastic.co/guide/en/elasticsearch/reference/6.3/analysis-ngram-tokenfilter.html

I tried ngram with this metadata.....
"ngram_filter": {
"type": "ngram",
"min_gram": "3",
"max_gram": "10"
}........
"ngramAnalyzerKey": {
"filter": [
"lowercase"
,
"ngram_filter"
],
"type": "custom",
"tokenizer": "keyword"
}.....

This matches if any of the word in query string matches even though there is a AND in query. Can you help ?

Could you provide a full recreation script as described in About the Elasticsearch category. It will help to better understand what you are doing. Please, try to keep the example as simple as possible.

A full reproduction script will help readers to understand, reproduce and if needed fix your problem. It will also most likely help to get a faster answer.

I found the solution..i was using ngram as filter instead of tokeniser. Below code snippets worked.

Created a index with below metadata,

curl -X PUT "localhost:9200/my_index" -H 'Content-Type: application/json' -d'
{
  "settings": {
    "analysis": {
      "analyzer": {
        "autocomplete": {
          "tokenizer": "autocomplete",
          "filter": [
            "lowercase"
          ]
        },
        "autocomplete_search": {
          "tokenizer": "lowercase"
        }
      },
      "tokenizer": {
        "autocomplete": {
          "type": "ngram",
          "min_gram": 1,
          "max_gram": 10,
          "token_chars": [
                          "letter",
                          "digit",
                          "punctuation",
                          "symbol"
                          ]
        }
      }
    }
  },
  "mappings": {
    "_doc": {
      "properties": {
        "title": {
          "type": "text",
          "analyzer": "autocomplete",
          "search_analyzer": "autocomplete_search"
        }
      }
    }
  }
} 

Inserted 2 records...

curl -X PUT "localhost:9200/my_index/_doc/1" -H 'Content-Type: application/json' -d'
{
  "title": "Quick Foxes" 
}
'curl -X PUT "localhost:9200/my_index/_doc/2" -H 'Content-Type: application/json' -d'
{
  "title": "Quick Tiger" 
}
'

Now if i search for "Quick Fo" with below query, it gives me only _doc 1, as it is matching both "Quick" and "Fo".

curl -X GET "localhost:9200/my_index/_search" -H 'Content-Type: application/json' -d'
{
  "query": {
    "match": {
      "title": {
        "query": "Quick Fo", 
        "operator": "and"
      }
    }
  }
}
'

This is what i wanted. But with ngram as filter it was giving result even if just "Quick" matches.

Looks like its not working in all cases.

For eg:- I inserted 2 new records ("SP SSPT", "PSS SSPT2"). When i search for "SP SSPT" according to expectation, only SP SSPT should come as result. But i get both SP SSPT and PSS SSPT2.

This is because ngram tokens for "PSS SSPT2" contain both SP & SSPT.

Can you let me know how to handle these scanarios?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.