Trying to Solve Relevancy and Performance Problem

Accu_Search · May 8, 2013, 5:38am

Currently I have around 10 million records indexed, which mainly is user
meta data like firstname and lastname with following mapping:

Field Mapping:

"user" : {"_source" : { "enabled" : true },
"properties" : {
"uniqueid" : { "type" : "string", "index" : "not_analyzed"},
"fname" : { "type" : "string", "index" : "analyzed", "analyzer"
: "standard" },
"lname" : { "type" : "string", "index" : "analyzed", "analyzer"
: "standard" }
.
.
.
}
}

I currently have a user lookup feature which lets users type in one or more
words to lookup user meta data without specifying what field is it
searching for. It is sort to Prefix search. For example user can search
for "Chris Mos" and the expected results should be in following order:

Chris Mos - Record that has fname as Chris and lname as Mos
Chris SomeLastName - Records that have fname as Chris
SomeFirstName Mos - Records that have lname as Mos
Christopher Mosby - Records henceforth should have either Chris or Mos
in prefix

I am firing following query which 95% of the time satisfies the criteria 1
but rest of the results dont necessary follow the order of criteria defined:

{explain:"false", query:{"query_string" : {"query" : "Chris* Mos*","fields"
: [ "fname ^1.5", "lname ^1.0"],"use_dis_max" : true,"analyze_wildcard" :
true}}}

Couple of problems that i am trying to solve here:

Relevancy - Is there a better way to query elastic search or given
the scenario should i be using different analyzer?I t
Performance - With records growing at very fast pace, i feel that
doing wild card search isn't the best approach. Is there an alternate way
in which i can query or if i can make use of other analyzers? I tried
nGram analyzer before but the issue i was facing was that it was also
returning records which had "Chris" as mid-text. For example records with
fname as "achristy" was also returned.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

dadoonet · May 8, 2013, 7:56am

Try with Edgengram.
Also have a look at multimatch query instead of query string.
And don't use wildcards. Here is what you can read in doc: "Note this query can be slow, as it needs to iterate over many terms."

--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 8 mai 2013 à 07:38, Accu Search accusearch83@gmail.com a écrit :

Currently I have around 10 million records indexed, which mainly is user meta data like firstname and lastname with following mapping:

Field Mapping:

"user" : {"_source" : { "enabled" : true },
"properties" : {
"uniqueid" : { "type" : "string", "index" : "not_analyzed"},
"fname" : { "type" : "string", "index" : "analyzed", "analyzer" : "standard" },
"lname" : { "type" : "string", "index" : "analyzed", "analyzer" : "standard" }
.
.
.
}
}

I currently have a user lookup feature which lets users type in one or more words to lookup user meta data without specifying what field is it searching for. It is sort to Prefix search. For example user can search for "Chris Mos" and the expected results should be in following order:

Chris Mos - Record that has fname as Chris and lname as Mos
Chris SomeLastName - Records that have fname as Chris
SomeFirstName Mos - Records that have lname as Mos
Christopher Mosby - Records henceforth should have either Chris or Mos in prefix

I am firing following query which 95% of the time satisfies the criteria 1 but rest of the results dont necessary follow the order of criteria defined:

{explain:"false", query:{"query_string" : {"query" : "Chris* Mos*","fields" : [ "fname ^1.5", "lname ^1.0"],"use_dis_max" : true,"analyze_wildcard" : true}}}

Couple of problems that i am trying to solve here:

Relevancy - Is there a better way to query elastic search or given the scenario should i be using different analyzer?I t
Performance - With records growing at very fast pace, i feel that doing wild card search isn't the best approach. Is there an alternate way in which i can query or if i can make use of other analyzers? I tried nGram analyzer before but the issue i was facing was that it was also returning records which had "Chris" as mid-text. For example records with fname as "achristy" was also returned.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Accu_Search · May 9, 2013, 5:25am

Thanks David. One more question, one of the future enhancement would be
something like auto suggest and minimum characters for auto suggest is 2.
Hence i am planning to set min gram size to 2 but i am in conundrum for
max gram size. Given that i am indexing names, so i was thinking of
settting max size to 40, is setting max gram to high value recommended? Or
is it usually closer to min gram size?

On Wednesday, May 8, 2013 12:56:40 AM UTC-7, David Pilato wrote:

Try with Edgengram.
Also have a look at multimatch query instead of query string.
And don't use wildcards. Here is what you can read in doc: "Note this
query can be slow, as it needs to iterate over many terms."

--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 8 mai 2013 à 07:38, Accu Search <accuse...@gmail.com <javascript:>> a
écrit :

Currently I have around 10 million records indexed, which mainly is user
meta data like firstname and lastname with following mapping:

Field Mapping:

"user" : {"_source" : { "enabled" : true },
"properties" : {
"uniqueid" : { "type" : "string", "index" : "not_analyzed"},
"fname" : { "type" : "string", "index" : "analyzed",
"analyzer" : "standard" },
"lname" : { "type" : "string", "index" : "analyzed",
"analyzer" : "standard" }
.
.
.
}
}

I currently have a user lookup feature which lets users type in one or
more words to lookup user meta data without specifying what field is it
searching for. It is sort to Prefix search. For example user can search
for "Chris Mos" and the expected results should be in following order:

Chris Mos - Record that has fname as Chris and lname as Mos

Chris SomeLastName - Records that have fname as Chris

SomeFirstName Mos - Records that have lname as Mos

Christopher Mosby - Records henceforth should have either Chris or Mos
in prefix

I am firing following query which 95% of the time satisfies the criteria 1
but rest of the results dont necessary follow the order of criteria defined:

{explain:"false", query:{"query_string" : {"query" : "Chris*
Mos*","fields" : [ "fname ^1.5", "lname ^1.0"],"use_dis_max" :
true,"analyze_wildcard" : true}}}

Couple of problems that i am trying to solve here:

Relevancy - Is there a better way to query Elasticsearch or given
the scenario should i be using different analyzer?I t

Performance - With records growing at very fast pace, i feel that
doing wild card search isn't the best approach. Is there an alternate way
in which i can query or if i can make use of other analyzers? I tried
nGram analyzer before but the issue i was facing was that it was also
returning records which had "Chris" as mid-text. For example records with
fname as "achristy" was also returned.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Relevation on wildcard results and wildcard speed Elasticsearch	6	413	July 6, 2017
Sorting on the not analyzed text Elasticsearch	2	376	July 6, 2017
Query on multiple fields Elasticsearch	7	612	July 6, 2017
Relevancy in results when searching by wildcard Elasticsearch	3	314	July 6, 2017
Performance of using Elasticsearch to search for people Elasticsearch	16	1351	September 12, 2022

Trying to Solve Relevancy and Performance Problem

Related topics