Running document level "AND" between nested objects


#1

Hi, I have an index for authors, one document per author. Each book is a nested document:

PUT my_index2
{
  "mappings": {
    "_doc": {
      "properties": {
        "books": {
          "type": "nested" 
        }
      }
    }
  }
}


PUT my_index2/_doc/1
{
  "author_name" : "Charles Dickens",
  "author_birth": "2/7/1812",
  "books" : [
    {
      "book_name": "Oliver Twist",
      "book_type" : "novel",
      "publish_date" :  1838
    },
    {
      "book_name": "Great Expectations",
      "book_type" : "novel",
      "publish_date" :  1861
    }
  ]
}

PUT my_index2/_doc/2
{
  "author_name" : "Anthony Trollope",
  "author_birth": "5/24/1815",
  "books" : [
    {
      "book_name": "Doctor Thorne",
      "book_type" : "novel",
      "publish_date" :  1858
    },
    {
      "book_name": "Orley Farm",
      "book_type" : "novel",
      "publish_date" :  1861
    }
  ]
}

How to query authors, who have books with "publish_date" 1838 AND 1861?
Clearly the result contains "Charles Dickens" and does NOT contain "Anthony Trollope"


(Jaspreet Singh) #2

If you are looking to create a query that returns authors with books with publish_date 1838 AND 1861, as you noted, it would only return "Charles Dickens" and not "Anthony Trollope" as result collection.
Are you looking for help on how to construct such a query?


#3

Yes, exactly.


(Jaspreet Singh) #4

This?

{
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "publish-date": "1838"
          }
        },
        {
          "term": {
            "publish-date": "1861"
          }
        }
      ]
    }
  }
}

#5

It doesn't work on nested fields


(Jaspreet Singh) #6

Ah I see what you are saying. My bad, I did not see the nested object type.
So, for nested objects, regular queries do not work as internally nested objects are indexed as separate documents. So whatever query we do, it runs on each object as a separated document.
Since at a given time, each object has only one publish_date value, even if we need AND, we cant do must. It would need to be should. Even then, as a result of hits, you will get back the entire document, that in turn will contain nested objects, that do not match the query since multiple nested objected (books) exist under same document (authors). The solution there is to leverage inner_hits, that gives access to the matching nested objects within the document.
Having said that, this is the query that worked for me.

{
    "query": {
        "bool": {
            "must": [
                {
                    "nested": {
                        "path": "books",
                        "query": {
                            "bool": {
                                "should": [
                                    {
                                        "term": {
                                            "books.publish_date": "1838"
                                        }
                                    },
                                    {
                                        "term": {
                                            "books.publish_date": "1861"
                                        }
                                    }
                                ]
                            }
                        },
                        "inner_hits": {
                            "highlight": {
                                "fields": {
                                    "books.author_name": {}
                                }
                            }
                        }
                    }
                }
            ]
        }
    }
}

It returns ...

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0.6931472,
    "hits": [
      {
        "_index": "my_index2",
        "_type": "doc1",
        "_id": "2",
        "_score": 0.6931472,
        "_source": {
          "author_name": "Anthony Trollope",
          "author_birth": "5/24/1815",
          "books": [
            {
              "book_name": "Doctor Thorne",
              "book_type": "novel",
              "publish_date": "1858"
            },
            {
              "book_name": "Orley Farm",
              "book_type": "novel",
              "publish_date": "1861"
            }
          ]
        },
        "inner_hits": {
          "books": {
            "hits": {
              "total": 1,
              "max_score": 0.6931472,
              "hits": [
                {
                  "_nested": {
                    "field": "books",
                    "offset": 1
                  },
                  "_score": 0.6931472,
                  "_source": {
                    "book_name": "Orley Farm",
                    "book_type": "novel",
                    "publish_date": "1861"
                  }
                }
              ]
            }
          }
        }
      },
      {
        "_index": "my_index2",
        "_type": "doc1",
        "_id": "1",
        "_score": 0.6931472,
        "_source": {
          "author_name": "Charles Dickens",
          "author_birth": "2/7/1812",
          "books": [
            {
              "book_name": "Oliver Twist",
              "book_type": "novel",
              "publish_date": "1838"
            },
            {
              "book_name": "Great Expectations",
              "book_type": "novel",
              "publish_date": "1861"
            }
          ]
        },
        "inner_hits": {
          "books": {
            "hits": {
              "total": 2,
              "max_score": 0.6931472,
              "hits": [
                {
                  "_nested": {
                    "field": "books",
                    "offset": 1
                  },
                  "_score": 0.6931472,
                  "_source": {
                    "book_name": "Great Expectations",
                    "book_type": "novel",
                    "publish_date": "1861"
                  }
                },
                {
                  "_nested": {
                    "field": "books",
                    "offset": 0
                  },
                  "_score": 0.6931472,
                  "_source": {
                    "book_name": "Oliver Twist",
                    "book_type": "novel",
                    "publish_date": "1838"
                  }
                }
              ]
            }
          }
        }
      }
    ]
  }
}

#7

Finally I found the answer, without using should and inner_hits:

GET my_index2/_search
{
    "query": {
        "bool": {
            "must": [
                {
                    "nested": {
                        "path": "books",
                        "query": {
                            "term": {
                                "books.publish_date": "1838"
                            }
                        }
                    }
                },
                {
                    "nested": {
                        "path": "books",
                        "query": {
                            "term": {
                                "books.publish_date": "1861"
                            }
                        }
                    }
                }
            ]
        }
    }
}