Trying to Solve Relevancy and Performance Problem

Currently I have around 10 million records indexed, which mainly is user
meta data like firstname and lastname with following mapping:

Field Mapping:

"user" : {"_source" : { "enabled" : true },
"properties" : {
"uniqueid" : { "type" : "string", "index" : "not_analyzed"},
"fname" : { "type" : "string", "index" : "analyzed", "analyzer"
: "standard" },
"lname" : { "type" : "string", "index" : "analyzed", "analyzer"
: "standard" }
.
.
.
}
}

I currently have a user lookup feature which lets users type in one or more
words to lookup user meta data without specifying what field is it
searching for. It is sort to Prefix search. For example user can search
for "Chris Mos" and the expected results should be in following order:

  1. Chris Mos - Record that has fname as Chris and lname as Mos

  2. Chris SomeLastName - Records that have fname as Chris

  3. SomeFirstName Mos - Records that have lname as Mos

  4. Christopher Mosby - Records henceforth should have either Chris or Mos
    in prefix

I am firing following query which 95% of the time satisfies the criteria 1
but rest of the results dont necessary follow the order of criteria defined:

{explain:"false", query:{"query_string" : {"query" : "Chris* Mos*","fields"
: [ "fname ^1.5", "lname ^1.0"],"use_dis_max" : true,"analyze_wildcard" :
true}}}

Couple of problems that i am trying to solve here:

  1. Relevancy - Is there a better way to query elastic search or given
    the scenario should i be using different analyzer?I t

  2. Performance - With records growing at very fast pace, i feel that
    doing wild card search isn't the best approach. Is there an alternate way
    in which i can query or if i can make use of other analyzers? I tried
    nGram analyzer before but the issue i was facing was that it was also
    returning records which had "Chris" as mid-text. For example records with
    fname as "achristy" was also returned.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Try with Edgengram.
Also have a look at multimatch query instead of query string.
And don't use wildcards. Here is what you can read in doc: "Note this query can be slow, as it needs to iterate over many terms."

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 8 mai 2013 à 07:38, Accu Search accusearch83@gmail.com a écrit :

Currently I have around 10 million records indexed, which mainly is user meta data like firstname and lastname with following mapping:

Field Mapping:

"user" : {"_source" : { "enabled" : true },
"properties" : {
"uniqueid" : { "type" : "string", "index" : "not_analyzed"},
"fname" : { "type" : "string", "index" : "analyzed", "analyzer" : "standard" },
"lname" : { "type" : "string", "index" : "analyzed", "analyzer" : "standard" }
.
.
.
}
}

I currently have a user lookup feature which lets users type in one or more words to lookup user meta data without specifying what field is it searching for. It is sort to Prefix search. For example user can search for "Chris Mos" and the expected results should be in following order:

  1. Chris Mos - Record that has fname as Chris and lname as Mos

  2. Chris SomeLastName - Records that have fname as Chris

  3. SomeFirstName Mos - Records that have lname as Mos

  4. Christopher Mosby - Records henceforth should have either Chris or Mos in prefix

I am firing following query which 95% of the time satisfies the criteria 1 but rest of the results dont necessary follow the order of criteria defined:

{explain:"false", query:{"query_string" : {"query" : "Chris* Mos*","fields" : [ "fname ^1.5", "lname ^1.0"],"use_dis_max" : true,"analyze_wildcard" : true}}}

Couple of problems that i am trying to solve here:

  1. Relevancy - Is there a better way to query elastic search or given the scenario should i be using different analyzer?I t

  2. Performance - With records growing at very fast pace, i feel that doing wild card search isn't the best approach. Is there an alternate way in which i can query or if i can make use of other analyzers? I tried nGram analyzer before but the issue i was facing was that it was also returning records which had "Chris" as mid-text. For example records with fname as "achristy" was also returned.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thanks David. One more question, one of the future enhancement would be
something like auto suggest and minimum characters for auto suggest is 2.
Hence i am planning to set min gram size to 2 but i am in conundrum for
max gram size. Given that i am indexing names, so i was thinking of
settting max size to 40, is setting max gram to high value recommended? Or
is it usually closer to min gram size?

On Wednesday, May 8, 2013 12:56:40 AM UTC-7, David Pilato wrote:

Try with Edgengram.
Also have a look at multimatch query instead of query string.
And don't use wildcards. Here is what you can read in doc: "Note this
query can be slow, as it needs to iterate over many terms."

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 8 mai 2013 à 07:38, Accu Search <accuse...@gmail.com <javascript:>> a
écrit :

Currently I have around 10 million records indexed, which mainly is user
meta data like firstname and lastname with following mapping:

Field Mapping:

"user" : {"_source" : { "enabled" : true },
"properties" : {
"uniqueid" : { "type" : "string", "index" : "not_analyzed"},
"fname" : { "type" : "string", "index" : "analyzed",
"analyzer" : "standard" },
"lname" : { "type" : "string", "index" : "analyzed",
"analyzer" : "standard" }
.
.
.
}
}

I currently have a user lookup feature which lets users type in one or
more words to lookup user meta data without specifying what field is it
searching for. It is sort to Prefix search. For example user can search
for "Chris Mos" and the expected results should be in following order:

  1. Chris Mos - Record that has fname as Chris and lname as Mos

  2. Chris SomeLastName - Records that have fname as Chris

  3. SomeFirstName Mos - Records that have lname as Mos

  4. Christopher Mosby - Records henceforth should have either Chris or Mos
    in prefix

I am firing following query which 95% of the time satisfies the criteria 1
but rest of the results dont necessary follow the order of criteria defined:

{explain:"false", query:{"query_string" : {"query" : "Chris*
Mos*","fields" : [ "fname ^1.5", "lname ^1.0"],"use_dis_max" :
true,"analyze_wildcard" : true}}}

Couple of problems that i am trying to solve here:

  1. Relevancy - Is there a better way to query Elasticsearch or given
    the scenario should i be using different analyzer?I t

  2. Performance - With records growing at very fast pace, i feel that
    doing wild card search isn't the best approach. Is there an alternate way
    in which i can query or if i can make use of other analyzers? I tried
    nGram analyzer before but the issue i was facing was that it was also
    returning records which had "Chris" as mid-text. For example records with
    fname as "achristy" was also returned.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.