ElasticSearch / Solr dis_max query skipping matching clauses

andreaschiffo · September 16, 2021, 5:43pm

I have an Elasticsearch index on one shard, for which I'm running a [dis_max][4] query that, given some user details

(First Name, Last Name, Date of Birth, Address, Phone, Username, Email etc.)

queries users from an index combining a set of criteria/matching clauses.

E.g.

match username ([fuzzy][1], boosted 2x)
should match first and last name ([bool][3] combining [match-term][2] query for FN and LN, boosted 1.1x)
must match FN, LN and DOB ([bool][3] combining [fuzzy][1] for FN and LN and [match-term][2] for DOB, boosted 3x)
match phone ([match-term][2] boosted 2x)

etc.

(See resources
[1]: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-fuzzy-query.html
[2]: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-term-query.html
[3]: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-bool-query.html
[4]: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-dis-max-query.html )

See query below (with obscured input data):

https://gist.github.com/andreaschiffo/bf5ebeac1d6875a1a78dbb9e2eb8e19b

All criteria account for a score
and I've set a tie_breaker to 0.5 so that the score of a result will be the max amongst all the scores, plus 0.5 times the rest of the scores.

Performing the query with few input combinations,

on some instances I get good scores that make for good matching,
on other instances, even expecting same or high enough score I get a very low score because some of the most relevant matching clauses seem to be skipped.

I have in fact debugged the query execution with "explain": true and in the explanation

the first result scores a high value with all query clauses,
the second one (that from the data should score enough) just scores a low value and some clauses don't appear in the explanation as if they were excluded/ignored.

I'd like to understand why these would be ignored/skipped in some cases.
Is anybody aware if this could be an issue in the way ES builds queryes into Solr?

See result example below (all data obscured but the results would be quite close in the distinct fields).

https://gist.github.com/andreaschiffo/e8c3d6b2f86c53ba6a28257d47a1831b

system · October 14, 2021, 5:43pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.