I've been thinking about why "nested" fields need to be handled with a
special "nested" query and aggregation type. Is it to handle the case where
there are multiple nested levels, to be able to control whether a query
involving two nested fields is within the same nested instance or across
two nested instances?
Although I would agree that being able to detect it automatically could
make things simpler, I think that the fact that it is excplicit is more
flexible. For example, it can make sense to copy field values into the root
document[1]. This can help speed-up some queries that don't need to know
about the tree structure of your document. And in that case you have two
ways to search the same field name:
either through the root document: faster but less flexible
or through the nested document: less flexible but slower
The fact that nested queries are explicit allows you to choose the way that
you want the field to be queried.
For aggregations, I think it is also nice to make it explicit so that
counts are not surprising: imagine that you have a document with properties
stored as nested documents and each property having a name. If you run a
terms aggregation on the property name from the root document, buckets will
count how many root documents have this property name. On the other hand,
if you run this terms aggregation through a nested field, this will count
the number of properties that have this name. Since each document can
have several properties, counts might be much higher.
I've been thinking about why "nested" fields need to be handled with a
special "nested" query and aggregation type. Is it to handle the case where
there are multiple nested levels, to be able to control whether a query
involving two nested fields is within the same nested instance or across
two nested instances?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.