Hi guys,
I am brand new to ElasticSearch, and am currently exploring its
features. One of them I am interested in is the fuzzy query, which I
am testing and having troubles to use. It is probably a dummy question
so I guess someone who already used this feature will quickly find the
answer, at least I hope.
Actually I already posted this message on StackOverflow but didn't get
any answer yet. If you want to take a look, this is the web page:
BTW I have the feeling that it might not be only related to
ElasticSearch but maybe directly to Lucene.
Let's start with a new index named "first index" in which I store an
object "label" with value "american football". This is the query I
use.
bash-3.2$ curl -XPOST 'http://localhost:9200/firstindex/node/?
pretty=true' -d '{ "node" : {
"label" : "american football"
}
}
'
This is the result I get.
{
"ok" : true,
"_index" : "firstindex",
"_type" : "node",
"_id" : "6TXNrLSESYepXPpFWjpl1A",
"_version" : 1
}
So far so good, now I want to find this entry using a fuzzy query.
This is the one I send:
bash-3.2$ curl -XGET 'http://localhost:9200/firstindex/node/_search?
pretty=true' -d ' {"query" : {
"fuzzy" : {
"label" : {
"value" : "american football",
"boost" : 1.0,
"min_similarity" : 0.0,
"prefix_length" : 0
}
}
}
}
'
And this is the result I get
{
"took" : 15,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}
As you can see, no hit. But now, when I shrink a bit my query's value
from "american football" to "american footb" like this:
bash-3.2$ curl -XGET 'http://localhost:9200/firstindex/node/_search?
pretty=true' -d ' {"query" : {
"fuzzy" : {
"label" : {
"value" : "american footb",
"boost" : 1.0,
"min_similarity" : 0.0,
"prefix_length" : 0
}
}
}
}
'
Then I get a correct hit on my entry, thus the result is:
{ "took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 0.19178301,
"hits" : [ {
"_index" : "firstindex",
"_type" : "node",
"_id" : "6TXNrLSESYepXPpFWjpl1A",
"_score" : 0.19178301, "_source" : {
"node" : {
"label" : "american football"
}
}
} ]
}
}
So, I have several questions related to this test:
-
Why I didn't get any result when performing a query with a value
completely equals the my only entry "american football" -
Is it related to the fact that I have a multi-words value?
-
Is there a way to get the "similarity" score in my query result so
I can understand better how to find the right threshold for my fuzzy
queries -
There is a page dedicated to fuzzy query on ElasticSearch web site,
but I am not sure it lists all the potential parameters I can use for
the fuzzy query. Were could I find such an exhaustive list? -
Same question for the other queries actually.
-
is there a difference between a Fuzzy Query and a Query String
Query using lucene syntax to get fuzzy matching?
Thanks in advance for your help!
Cheers,
Adrien