Fuzzy query doesn't return all the matching records


(Yervand Aghababyan) #1

I've posted this problem on stackoverflow and for several days no response
has been given there. So I decided to rise the issue here as it is very
important for my current project to have it solved.
Link to the stackoverflow question:

Repost:

In my index I've got 40 rows with data like this:

| -name- | -surname- | -bdate string- | -creation date- |
| leva | agabalyan | 19560901 | 2013-09-21T11:19:13.968Z |
| leva | agabalyan | 19560901 | 2012-03-14T11:16:47.665Z |
| leva | agabalyan | 19560901 | 2012-02-19T11:38:47.972Z |
| leva | agabalyan | 19560901 | 2011-08-22T11:49:57.995Z |
.....

All of these rows have the same name, surname and birth date string fields.
The only difference is the creation date. In the real application there are
significantly more fields/columns ( about 30 of them) but as you'll see
they do not participate in the query. Also I've got significantly more rows
like these outside the time range of my search query. So I suppose those
shouldn't make a difference as well.

I also have a single rows like this one:

| -name- | -surname- | -bdate string- | -creation date- |
| lyova | aghabalyan | 19560901 | 2013-06-27T11:19:33.345Z |

As you see the difference from the first table is in the name and surname
fields. The name differs by 2 symbols and the surname by 1.

For searching I use this query:

{
"query":{
"bool":{
"must":[{
"fuzzy":{
"registration.name":{
"value":"lyova",
"min_similarity":"0.45"
}}
},
{"fuzzy":{
"registration.surname":{
"value":"aghabalyan",
"min_similarity":"0.65"
}}
},
{"term":{
"registration.birthDateStr":"19560601"
}},
{"range":{
"registration.created":{
"from":"2011-01-01",
"to":"2014-01-01"
}
}}
],
"must_not":[],
"should":[]}
},
"from":0,
"size":50,
"sort":[],
"facets":{}
}

This search query returns the 1 row containing lyova and 14 rows with leva.
I don't understand why it is not returning the remaining 26 rows containing
leva in them. The total number of hits is also 15 while it should have
obviously been 41. It seems to me that I've encountered an elasticsearch
bug here.

Also I'd like to note that IMO the problem is not in the additional search
parameters I use(besides the fuzzy ones) because: If i search for lyova(sim:0.3)
aghabalyan(sim:0.6) 19560601 dateRange I'm getting 15 results(1 lyova and
14 leva) but If i search for leva(sim:0.3) agabalyan(sim:0.6) 19560601
dateRange I'm getting 40 record(40 leva and 0 lyova). I believe I should be
getting 41 results including the lyova record when searching forleva.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #2