Phrase Suggester with multiple Fields


(brupm) #1

I am using the phrase suggester to implement did-you-mean functionality. My
source field is named did_you_mean_source which is a combination of first
and last name with a space in the middle.

When I search for say "allex blak" I do get fairly descent suggestions,
including the "alex black" I am hoping to get.

The problem is I also get "alex [any other first and/or last name present
in the index that has a subbestion for blak]". This suggestion doesn't
match to any "results" (not suggestion result) but actual search result in
my index.

So you search for first and last name which you happen to misspelled, but
the suggestions propose another spelling which doesn't actually return any
index search results. In other words, I need to somehow preserve the
identity that one is a first name and the other is a last name.

What I am trying to achieve is a did-you-mean functionality where I can
type a first and last name, and get suggestions based of of the
did_you_mean_source but only where both typed words match a document.

Here's my analyzer:
did_you_mean:
type: custom
tokenizer: standard
filter: ["lowercase", "trim",]

Here's the suggest part of my query:

"suggest": {
"text": "allex blak",
"did_you_mean" : {
"phrase" : {
"field": "did_you_mean_source",
"real_word_error_likelihood": 0.90,
"max_errors": 1,
"direct_generator" : [{
"field" : "did_you_mean_source",
"suggest_mode" : "always",
"min_word_length" : 3,
"size": 5,
"prefix_length": 2,
"min_doc_freq": 1
}]
}
}
}

Here's what my mappings look like:

{
"development_search_suggestions": {
"mappings": {
"search_suggestion": {
"_all": {
"enabled": false
},
"properties": {
"did_you_mean_source": {
"type": "string",
"analyzer": "did_you_mean"
},
"keywords": {
"type": "string",
"index_options": "offsets",
"analyzer": "full",
"fields": {
"partial": {
"type": "string",
"index_options": "offsets",
"index_analyzer": "partial_auto_suggest",
"search_analyzer": "full_with_auto_suggest_synonyms"
},
"synonymic": {
"type": "string",
"index_analyzer": "full_with_auto_suggest_synonyms",
"search_analyzer": "full"
}
}
},
"keywords_auxiliary": {
"type": "string",
"index_options": "offsets",
"analyzer": "full",
"fields": {
"partial": {
"type": "string",
"index_options": "offsets",
"index_analyzer": "partial_auto_suggest",
"search_analyzer": "full_with_auto_suggest_synonyms"
},
"synonymic": {
"type": "string",
"index_analyzer": "full_with_auto_suggest_synonyms",
"search_analyzer": "full"
}
}
}
}
}
}
}
}

A few examples:

Say my index contains the following documents with first and last name in
respective orders:
bruno miranda
miranda bella
bran scott

If I search for: "brno miranda" I should get a suggestion for "bruno
miranda" but I should not have suggestions for "bran miranda" because that
document doesn't exist in the db. It's simply a mismatch of a first name +
a different document's last name.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e4f16d17-a0fd-4a4f-88fd-bbeaf7238353%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #2