Best practice for prefix search

We have a need to index a few fields (first name, last name, address, and
id number). All fields are alphanumeric.

The users will enter terms into an text input. We want to be able to
search all the fields with a prefix, but giving more preferences to say
full id number vs partial name.

So say, if someone search for John, there might be 1000 john's in the
system, but then they start typing an id alphanumeric chars and it starts
matching partial ids based on prefix matching. When a user enter a full
id, that should take preference over anything else in the system. I first
want to make sure I'm approaching the indexing correctly. I'm attaching
what I have now below. I'm also a bit unsure on what best query strategy
to use to do this. I believe Prefix query doesn't parse the terms, so it's
not a good candidate. String query is fine, but requires users to enter
wildcards. These users will just enter names, city, id, etc... into
autocomplete input, so I'd don't want to burden them with knowing advanced
query syntax. I'm sure I need to use a combination and I've tried a bunch,
from text to string, to prefix and combining them with bool and dis_max,
but still haven't quite gotten it. Maybe someone can let me know the best
strategy to search on multiple fields with user provided terms string
without requiring them to use wildcards, etc...

I've created an index that does exact as well as edge ngram indexing on
each field in question. I'm also not sure if this is needed. Isn't just
an edge ngram enough as long as the max_ngram covers the longest string you
care about, or do you still need to include the exact one (for efficiency)?

Here are my setting....

"analysis": {
"analyzer" : {
"full_string":{
"filter":[
"standard",
"lowercase",
"asciifolding"
],
"type":"custom",
"tokenizer":"standard"
},
"prefix_string":{
"filter":[
"standard",
"lowercase",
"asciifolding",
"string_ngrams"
],
"type":"custom",
"tokenizer":"standard"
}
},
"filter" : {
"string_ngrams" : {
"type" : "edgeNGram",
"min_gram" : 2,
"max_gram" : 10,
"side" : 'front'
}
}
}

And here is the mapping

{
'first_name': {
"fields":{
"first_name":{
"type":"string",
"analyzer":"full_string"
},
"partial":{
"search_analyzer":"full_string",
"index_analyzer":"prefix_string",
"type":"string"
}
},
"type":"multi_field"
},
'last_name': {
"fields":{
"last_name":{
"type":"string",
"analyzer":"full_string"
},
"partial":{
"search_analyzer":"full_string",
"index_analyzer":"prefix_string",
"type":"string"
}
},
"type":"multi_field"
},
'city': {
"fields":{
"city":{
"type":"string",
"analyzer":"full_string"
},
"partial":{
"search_analyzer":"full_string",
"index_analyzer":"prefix_string",
"type":"string"
}
},
"type":"multi_field"
},
'athlete_id': {
"fields":{
"athlete_id":{
"type":"string",
"analyzer":"full_string"
},
"partial":{
"search_analyzer":"full_string",
"index_analyzer":"prefix_string",
"type":"string"
}
},
"type":"multi_field"
}

--

Have you looked into using just a multi_match query 1 with a
match_phrase_prefix 2?

Karel

On Monday, December 10, 2012 5:34:54 PM UTC+1, Ilya Sterin wrote:

We have a need to index a few fields (first name, last name, address, and
id number). All fields are alphanumeric.

The users will enter terms into an text input. We want to be able to
search all the fields with a prefix, but giving more preferences to say
full id number vs partial name.

So say, if someone search for John, there might be 1000 john's in the
system, but then they start typing an id alphanumeric chars and it starts
matching partial ids based on prefix matching. When a user enter a full
id, that should take preference over anything else in the system. I first
want to make sure I'm approaching the indexing correctly. I'm attaching
what I have now below. I'm also a bit unsure on what best query strategy
to use to do this. I believe Prefix query doesn't parse the terms, so it's
not a good candidate. String query is fine, but requires users to enter
wildcards. These users will just enter names, city, id, etc... into
autocomplete input, so I'd don't want to burden them with knowing advanced
query syntax. I'm sure I need to use a combination and I've tried a bunch,
from text to string, to prefix and combining them with bool and dis_max,
but still haven't quite gotten it. Maybe someone can let me know the best
strategy to search on multiple fields with user provided terms string
without requiring them to use wildcards, etc...

I've created an index that does exact as well as edge ngram indexing on
each field in question. I'm also not sure if this is needed. Isn't just
an edge ngram enough as long as the max_ngram covers the longest string you
care about, or do you still need to include the exact one (for efficiency)?

Here are my setting....

"analysis": {
"analyzer" : {
"full_string":{
"filter":[
"standard",
"lowercase",
"asciifolding"
],
"type":"custom",
"tokenizer":"standard"
},
"prefix_string":{
"filter":[
"standard",
"lowercase",
"asciifolding",
"string_ngrams"
],
"type":"custom",
"tokenizer":"standard"
}
},
"filter" : {
"string_ngrams" : {
"type" : "edgeNGram",
"min_gram" : 2,
"max_gram" : 10,
"side" : 'front'
}
}
}

And here is the mapping

{
'first_name': {
"fields":{
"first_name":{
"type":"string",
"analyzer":"full_string"
},
"partial":{
"search_analyzer":"full_string",
"index_analyzer":"prefix_string",
"type":"string"
}
},
"type":"multi_field"
},
'last_name': {
"fields":{
"last_name":{
"type":"string",
"analyzer":"full_string"
},
"partial":{
"search_analyzer":"full_string",
"index_analyzer":"prefix_string",
"type":"string"
}
},
"type":"multi_field"
},
'city': {
"fields":{
"city":{
"type":"string",
"analyzer":"full_string"
},
"partial":{
"search_analyzer":"full_string",
"index_analyzer":"prefix_string",
"type":"string"
}
},
"type":"multi_field"
},
'athlete_id': {
"fields":{
"athlete_id":{
"type":"string",
"analyzer":"full_string"
},
"partial":{
"search_analyzer":"full_string",
"index_analyzer":"prefix_string",
"type":"string"
}
},
"type":"multi_field"
}

--