nGram and wildcards


(MyHeadHurts) #1

I am having trouble searching with wildcards when indexing data using
nGram

For example:

If I index data using the following index settings:

index: 
  analysis: 
    analyzer: 
      default: 
        type: standard 
        stopwords: _none_

I am able to search something like "b?t" and receive search results for *
"but"* and "bat" etc...

However, should I change the index settings to use nGram for partial word
matching, like follows:

index: 
  analysis: 
    analyzer: 
      default: 
        type: custom 
        tokenizer: nGramTokenizer 
        filter: [lowercase,stopWordsFilter] 
    tokenizer: 
      nGramTokenizer: 
        type: nGram 
        min_gram: 1 
        max_gram: 2 
      stopWordsFilter: 
        type: stop 
        stopwords: _none_

Then partial word matching works, ie. searching "bu"will return "but"
and "bug" etc... but wildcards no longer seem to be supported...so if I
search "b?t" then there are no matching hits.

Is there a way that I can use wildcards and nGram together?

--


(MyHeadHurts) #2

If this is not possible, is there a way to simulate a query as if we were
using the "?" wildcard character?

On Wednesday, 3 October 2012 15:14:06 UTC+2, My Head Hurts wrote:

I am having trouble searching with wildcards when indexing data using
nGram

For example:

If I index data using the following index settings:

index: 
  analysis: 
    analyzer: 
      default: 
        type: standard 
        stopwords: _none_

I am able to search something like "b?t" and receive search results for
"but" and "bat" etc...

However, should I change the index settings to use nGram for partial
word matching, like follows:

index: 
  analysis: 
    analyzer: 
      default: 
        type: custom 
        tokenizer: nGramTokenizer 
        filter: [lowercase,stopWordsFilter] 
    tokenizer: 
      nGramTokenizer: 
        type: nGram 
        min_gram: 1 
        max_gram: 2 
      stopWordsFilter: 
        type: stop 
        stopwords: _none_

Then partial word matching works, ie. searching "bu"will return "but"
and "bug" etc... but wildcards no longer seem to be supported...so if I
search "b?t" then there are no matching hits.

Is there a way that I can use wildcards and nGram together?

--


(MyHeadHurts) #3

If this is not possible, is there a way to simulate a query as if we were
using the "?" wildcard character?

On Wednesday, 3 October 2012 15:14:06 UTC+2, My Head Hurts wrote:

I am having trouble searching with wildcards when indexing data using
nGram

For example:

If I index data using the following index settings:

index: 
  analysis: 
    analyzer: 
      default: 
        type: standard 
        stopwords: _none_

I am able to search something like "b?t" and receive search results for
"but" and "bat" etc...

However, should I change the index settings to use nGram for partial
word matching, like follows:

index: 
  analysis: 
    analyzer: 
      default: 
        type: custom 
        tokenizer: nGramTokenizer 
        filter: [lowercase,stopWordsFilter] 
    tokenizer: 
      nGramTokenizer: 
        type: nGram 
        min_gram: 1 
        max_gram: 2 
      stopWordsFilter: 
        type: stop 
        stopwords: _none_

Then partial word matching works, ie. searching "bu"will return "but"
and "bug" etc... but wildcards no longer seem to be supported...so if I
search "b?t" then there are no matching hits.

Is there a way that I can use wildcards and nGram together?

--


(MyHeadHurts) #4

If this is not possible, is there a way to simulate a query as if we were
using the "?" wildcard character?

--


(system) #5