Hi all,
I want to make a fuzzy query that is expected to work on multiple
fields. What is the correct way to do it?
I made the following queries https://gist.github.com/923848 but it
does not work
Thanks
pcdinh
Hi all,
I want to make a fuzzy query that is expected to work on multiple
fields. What is the correct way to do it?
I made the following queries https://gist.github.com/923848 but it
does not work
Thanks
pcdinh
Hi pcdinh
I want to make a fuzzy query that is expected to work on multiple
fields. What is the correct way to do it?I made the following queries gist:923848 · GitHub but it
does not work
The easiest way to do it is to use the ~ operator in the query string,
for instance:
curl -XGET 'http://127.0.0.1:9200/accdev/exp/_search?pretty=1' -d '
{
"query" : {
"query_string" : {
"query" : "username:pcdinh~0.5 fullname_idx:pcdinh~0.1^3",
"fuzzy_prefix_length" : 1
}
}
}
'
For more on the Lucene query string syntax, see:
http://lucene.apache.org/java/3_0_0/queryparsersyntax.html
The query string is a bit less configurable than a fuzzy query (for
instance you can't specify a per-field fuzzy_prefix_length.
In order to use the fuzzy query against two different fields, you need
to use two fuzzy queries:
{
"fuzzy" : {
"username" : {
"min_similarity" : 0.5,
"boost" : 1,
"value" : "pcdin",
"prefix_length" : 0
}
}
},
{
"fuzzy" : {
"fullname_idx" : {
"min_similarity" : 0.1,
"boost" : 3,
"value" : "pcdinh",
"prefix_length" : 1
}
}
}
But you need to combine these two queries somehow. Your options are to
wrap them in either a 'bool' query, or a 'dis_max' query:
bool:Elasticsearch Platform — Find real-time answers at scale | Elastic
curl -XGET 'http://127.0.0.1:9200/accdev/exp/_search?pretty=1' -d '
{
"query" : {
"bool" : {
"should" : [
FUZZY QUERIES HERE
]
}
}
}
'
dis_max:Elasticsearch Platform — Find real-time answers at scale | Elastic
curl -XGET 'http://127.0.0.1:9200/accdev/exp/_search?pretty=1' -d '
{
"query" : {
"dis_max" : {
"queries" : [
FUZZY QUERIES HERE
],
"tie_breaker" : 0.7
}
}
}
'
The difference between bool and dis_max is how it combines the score if
both queries match. The bool query would add both their scores
together. The dis_max would return the better score.
In your case, where you're searching for a matching user, it would
probably make sense to use the bool query, so the full query would look
like this:
curl -XGET 'http://127.0.0.1:9200/accdev/exp/_search?pretty=1' -d '
{
"query" : {
"bool" : {
"should" : [
{
"fuzzy" : {
"username" : {
"min_similarity" : 0.5,
"boost" : 1,
"value" : "pcdin",
"prefix_length" : 0
}
}
},
{
"fuzzy" : {
"fullname_idx" : {
"min_similarity" : 0.1,
"boost" : 3,
"value" : "pcdinh",
"prefix_length" : 1
}
}
}
]
}
}
}
'
Alternatively, unless you have disabled it, the values for both username
and fullname_idx are also indexed in the _all field. So, (sacrificing
the per-field customizations) you could just do:
curl -XGET 'http://127.0.0.1:9200/accdev/exp/_search?pretty=1' -d '
{
"query" : {
"fuzzy" : {
"_all" : {
"min_similarity" : 0.1,
"value" : "pcidnh"
}
}
}
}
'
Or even:
curl -XGET 'http://127.0.0.1:9200/accdev/exp/_search?pretty=1' -d '
{
"query" : {
"field" : {
"_all" : "pcidnh~0.1"
}
}
}
'
clint
Hi Clinton,
Thank a lot for your excellent answer. I will use Lucene syntax to
reduce request bandwidth to ES server.
Also, it is great to see your answer part of ES documentation
Regards,
pcdinh
On 17 Tháng Tư, 16:31, Clinton Gormley clin...@iannounce.co.uk
wrote:
Hi pcdinh
I want to make a fuzzy query that is expected to work on multiple
fields. What is the correct way to do it?I made the following querieshttps://gist.github.com/923848but it
does not workThe easiest way to do it is to use the ~ operator in the query string,
for instance:curl -XGET 'http://127.0.0.1:9200/accdev/exp/_search?pretty=1' -d '
{
"query" : {
"query_string" : {
"query" : "username:pcdinh~0.5 fullname_idx:pcdinh~0.1^3",
"fuzzy_prefix_length" : 1
}
}}'
For more on the Lucene query string syntax, see:Apache Lucene - Query Parser Syntax
The query string is a bit less configurable than a fuzzy query (for
instance you can't specify a per-field fuzzy_prefix_length.In order to use the fuzzy query against two different fields, you need
to use two fuzzy queries:{ "fuzzy" : { "username" : { "min_similarity" : 0.5, "boost" : 1, "value" : "pcdin", "prefix_length" : 0 } } }, { "fuzzy" : { "fullname_idx" : { "min_similarity" : 0.1, "boost" : 3, "value" : "pcdinh", "prefix_length" : 1 } } }
But you need to combine these two queries somehow. Your options are to
wrap them in either a 'bool' query, or a 'dis_max' query:bool:Elasticsearch Platform — Find real-time answers at scale | Elastic
curl -XGET 'http://127.0.0.1:9200/accdev/exp/_search?pretty=1' -d '
{
"query" : {
"bool" : {
"should" : [
FUZZY QUERIES HERE
]
}
}}'
dis_max:Elasticsearch Platform — Find real-time answers at scale | Elastic....
curl -XGET 'http://127.0.0.1:9200/accdev/exp/_search?pretty=1' -d '
{
"query" : {
"dis_max" : {
"queries" : [
FUZZY QUERIES HERE
],
"tie_breaker" : 0.7
}
}}'
The difference between bool and dis_max is how it combines the score if
both queries match. The bool query would add both their scores
together. The dis_max would return the better score.In your case, where you're searching for a matching user, it would
probably make sense to use the bool query, so the full query would look
like this:curl -XGET 'http://127.0.0.1:9200/accdev/exp/_search?pretty=1' -d '
{
"query" : {
"bool" : {
"should" : [
{
"fuzzy" : {
"username" : {
"min_similarity" : 0.5,
"boost" : 1,
"value" : "pcdin",
"prefix_length" : 0
}
}
},
{
"fuzzy" : {
"fullname_idx" : {
"min_similarity" : 0.1,
"boost" : 3,
"value" : "pcdinh",
"prefix_length" : 1
}
}
}
]
}
}}'
Alternatively, unless you have disabled it, the values for both username
and fullname_idx are also indexed in the _all field. So, (sacrificing
the per-field customizations) you could just do:curl -XGET 'http://127.0.0.1:9200/accdev/exp/_search?pretty=1' -d '
{
"query" : {
"fuzzy" : {
"_all" : {
"min_similarity" : 0.1,
"value" : "pcidnh"
}
}
}}'
Or even:
curl -XGET 'http://127.0.0.1:9200/accdev/exp/_search?pretty=1' -d '
{
"query" : {
"field" : {
"_all" : "pcidnh~0.1"
}
}}'
clint
© 2020. All Rights Reserved - Elasticsearch
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant logo are trademarks of the Apache Software Foundation in the United States and/or other countries.