I recently ran into a problem of missing documents in query result when
custom score script is used. After some testing, I found that the problem
seems occur when the script tries to access a field in a nested doc where a
particular root document does not contain any such nested doc.
To reproduce the problem, test data and queries can be found here: http://goo.gl/iHOc5. The example may not make much sense in real world, but
the idea is to sort products by average rate from users' review. One
particular requirement is to always treat anonymous user's rate as 3 and
assign rate as 3 for products with no reviews.
We can determine whether a user is anonymous or not by
checking review.user.member_id field is empty or not:
doc['review.user.member_id'].empty, this seems work fine except that
products with no reviews are dropped out in the result as the first query
example shows. Is this a bug? As there is no query/filter that excludes
documents, shouldn't all documents be returned?
Also, there seems no way to determine whether a review exists or not. The
second query example shows doc['review'].empty does not work, this makes
sense because indeed, there is not such field as 'review' under the
'product' index, 'review' is a nested document. However, the question
remains: is there a way to determine the existence of a nested doc?
I wonder if this is a direct result of how nested docs work ( Nested and Parent/Child Docs in ElasticSearch | PPT).
Still, it looks as though from an api perspective your queries should be
honored. I think at the very least this should be documented, but I would
file an issue with elasticsearch on github regardless.
Not sure if this is related:
Cheers,
Bob
On Friday, 10 May 2013 12:38:16 UTC-4, Junjun Zhang wrote:
Hi,
I recently ran into a problem of missing documents in query result when
custom score script is used. After some testing, I found that the problem
seems occur when the script tries to access a field in a nested doc where a
particular root document does not contain any such nested doc.
To reproduce the problem, test data and queries can be found here: http://goo.gl/iHOc5. The example may not make much sense in real world,
but the idea is to sort products by average rate from users' review. One
particular requirement is to always treat anonymous user's rate as 3 and
assign rate as 3 for products with no reviews.
We can determine whether a user is anonymous or not by
checking review.user.member_id field is empty or not:
doc['review.user.member_id'].empty, this seems work fine except that
products with no reviews are dropped out in the result as the first query
example shows. Is this a bug? As there is no query/filter that excludes
documents, shouldn't all documents be returned?
Also, there seems no way to determine whether a review exists or not. The
second query example shows doc['review'].empty does not work, this makes
sense because indeed, there is not such field as 'review' under the
'product' index, 'review' is a nested document. However, the question
remains: is there a way to determine the existence of a nested doc?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.