Wildcard searches


(Rakesh Shiriyara) #1

Currently i use below wildcard search for my service,

{
  "query": {
    "bool": {
      "must": [
        {
          "wildcard": {
            "PRODUCT_DESCRIPTION": "\*collaboration\*services\*shiriyara\*"
          }
        }
      ]
    }
  }
}

This returns me expected result. But i am looking for alternative ways to achieve this without using wildcard query, as wildcard takes more time.

I tried "query_string" on a "standard" analyzed field. But this returns result if whole word matches.

  "query_string": {
    "default_field": "PRODUCT_DESCRIPTION",
    "default_operator": "AND",
    "query": "collaboration services shiriyara"
  }

If the string is "collab services shiriyara", it won't give any result, whereas wildcard gives.

Let me know, if anybody has thoughts. Index time changes also fine with me.


(David Pilato) #2

It'd be good to Ngram based analyzer. See https://www.elastic.co/guide/en/elasticsearch/reference/6.3/analysis-ngram-tokenfilter.html


(Rakesh Shiriyara) #3

I tried ngram with this metadata.....
"ngram_filter": {
"type": "ngram",
"min_gram": "3",
"max_gram": "10"
}........
"ngramAnalyzerKey": {
"filter": [
"lowercase"
,
"ngram_filter"
],
"type": "custom",
"tokenizer": "keyword"
}.....

This matches if any of the word in query string matches even though there is a AND in query. Can you help ?


(David Pilato) #4

Could you provide a full recreation script as described in About the Elasticsearch category. It will help to better understand what you are doing. Please, try to keep the example as simple as possible.

A full reproduction script will help readers to understand, reproduce and if needed fix your problem. It will also most likely help to get a faster answer.


(Rakesh Shiriyara) #5

I found the solution..i was using ngram as filter instead of tokeniser. Below code snippets worked.

Created a index with below metadata,

curl -X PUT "localhost:9200/my_index" -H 'Content-Type: application/json' -d'
{
  "settings": {
    "analysis": {
      "analyzer": {
        "autocomplete": {
          "tokenizer": "autocomplete",
          "filter": [
            "lowercase"
          ]
        },
        "autocomplete_search": {
          "tokenizer": "lowercase"
        }
      },
      "tokenizer": {
        "autocomplete": {
          "type": "ngram",
          "min_gram": 1,
          "max_gram": 10,
          "token_chars": [
                          "letter",
                          "digit",
                          "punctuation",
                          "symbol"
                          ]
        }
      }
    }
  },
  "mappings": {
    "_doc": {
      "properties": {
        "title": {
          "type": "text",
          "analyzer": "autocomplete",
          "search_analyzer": "autocomplete_search"
        }
      }
    }
  }
} 

Inserted 2 records...

curl -X PUT "localhost:9200/my_index/_doc/1" -H 'Content-Type: application/json' -d'
{
  "title": "Quick Foxes" 
}
'curl -X PUT "localhost:9200/my_index/_doc/2" -H 'Content-Type: application/json' -d'
{
  "title": "Quick Tiger" 
}
'

Now if i search for "Quick Fo" with below query, it gives me only _doc 1, as it is matching both "Quick" and "Fo".

curl -X GET "localhost:9200/my_index/_search" -H 'Content-Type: application/json' -d'
{
  "query": {
    "match": {
      "title": {
        "query": "Quick Fo", 
        "operator": "and"
      }
    }
  }
}
'

This is what i wanted. But with ngram as filter it was giving result even if just "Quick" matches.


(Rakesh Shiriyara) #6

Looks like its not working in all cases.

For eg:- I inserted 2 new records ("SP SSPT", "PSS SSPT2"). When i search for "SP SSPT" according to expectation, only SP SSPT should come as result. But i get both SP SSPT and PSS SSPT2.

This is because ngram tokens for "PSS SSPT2" contain both SP & SSPT.

Can you let me know how to handle these scanarios?


(system) #7

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.