Hi,
I'm using elasticsearch 1.5.2 provided by amazon aws and I'm using the official php library to connect to it. I've got this problem where if i apply ngram analyzer for the searchable fields, and then search through it with a "muilti_match" query, and the search term contains a whitespace, the query returns all the entries instead of just the relevant ones. If I remove the ngram analyzer, then multiword queries behave normally and return the relevant results but then I lose partial matching.
The following is what Iv'e tried:
- Set the tokenizer to
whitespace
or setting it tokeyword
- Defining ngram as a
tokenizer
instead of astoken filter
- Using a
pattern
tokenizer and specifying whitespace as the pattern so that it splits by space. - Using
edge-ngram
instead of ngram
Non of the above made any difference. One other thing i tried is word_delimiter
filter with catenate_all
set to true. The effect of this was to join multiple words into a single word by removing spaces. This, when coupled with ngram filter, seemed to work for some cases, but obviously it's not a viable solution because there are too many edge case that i cant account for ()like when the 2 search terms aren't in the same position.
My requirement is to have partial matching and also to allow a search term with spaces in it.
Following is my code.
// Analyzer
'analysis' => array(
"filter" => array(
"ngram_token_filter" => array(
"type" => "nGram",
"min_gram" => "1",
"max_gram" => "15"
)
),
'analyzer' => array(
'ngram_analyzer' => array(
'type' => 'custom',
'tokenizer' => 'standard',
'filter' => array(
'lowercase',
'ngram_token_filter'
)
)
)
)
// Mapping
'properties' => array(
'title' => array('type' => 'string', 'analyzer' => 'ngram_analyzer'),
'description' => array('type' => 'string', 'analyzer' => 'ngram_analyzer'),
'type' => array('type' => 'string', 'analyzer' => 'ngram_analyzer'),
'status' => array('type' => 'byte')
)
// Query
'query' => array(
'filtered' => array(
'query' => array(
'multi_match' => array(
'query' => $searchTerm,
'type' => 'most_fields',
"minimum_should_match" => "75%",
'fields' => array('title^2', 'description', 'type')
)
),
'filter' => array(
'bool' => array(
'must' => array(
array(
'term' => array(
'status' => 1
)
)
)
)
)
)
)
Any help would be greatly appreciated. Thanks.