Query child documents, with values from child and parent document

Hello.

Lets say I have made an index of books. Each book has a name and a publish date. Each book also has a list of users that have read the said book. These users have an username and a birthday date.

The mapping would be something like this:

{
  "book": {
    "dynamic": "strict",
    "properties": {
      "name" : {"type" : "text"},
      "publishDate" : {"type" : "date"},
      "reader": {
        "dynamic": "strict",
        "properties": {
          "username" : {"type" : "text"},
          "birthDay" : {"type" : "date"}
        }
      }
    }
  }
}

I would like people using my service to make a search of users, who have been born after year 2000, and have read a book published before 1990. The response should be a list of all of these readers (username and birthday) who match the query. And if possible, I would also like the response to include a list of all books for each reader, that they have read.

Is something like this possible? I have read the documentation, but im new to elastic search, so I could not figure it out. Im using elastic 5.5.

What you're describing is a many-to-many relationship (a book can have multiple readers, and a reader can have read multiple books). Elasticsearch does not really have a construct for these N:N relationships. The "Handling Relationships" section of "Elasticsearch: The Definitive Guide" provides some good guidance on dealing with relational data. (That book is a bit outdated, but considering you're still on version 5 most of it will still be applicable)

There are various solutions to what you're trying to achieve, and one is not necessarily better than the other. It all depends on how you're going to query the data and what you want Elasticsearch to return. Based on what you've provided I'd go with a reader-centric index, in which every reader becomes an Elasticsearch document. To every reader document you could then add an array of nested objects with all the books that they've read ("denormalization"). You can query these nested objects using a nested query.

It would look something like this. You define the index with a reader type mapping, in which a books field is mapped as a nested type:

PUT books
{
  "mappings": {
    "reader": {
      "dynamic": "strict",
      "properties": {
        "username": {
          "type": "text"
        },
        "birthDay": {
          "type": "date"
        },
        "books": {
          "type": "nested",
          "dynamic": "strict",
          "properties": {
            "name": {
              "type": "text"
            },
            "publishDate": {
              "type": "date"
            }
          }
        }
      }
    }
  }
}

You can now index your documents. One document per reader, with an array of the books that they've read:

PUT books/reader/1
{
  "username": "Jane Doe",
  "birthDay": "2001-01-01",
  "books": [
    {
      "name": "Elasticsearch: The Definitive Guide",
      "publishDate": "2015-02-07"
    },
    {
      "name": "Stranger in a Strange Land",
      "publishDate": "1961-06-01"
    }
  ]
}

PUT books/reader/2
{
  "username": "John Smith",
  "birthDay": "1970-07-07",
  "books": [
    {
      "name": "Elasticsearch: The Definitive Guide",
      "publishDate": "2015-02-07"
    }
  ]
}

Now, if you want to find all readers who have been born after year 2000, and have read a book published before 1990, you could write a query like this:

GET books/reader/_search
{
  "query": {
    "bool": {
      "filter": [
        {
          "nested": {
            "path": "books",
            "query": {
              "range": {
                "books.publishDate": {
                  "lt": "1990-01-01"
                }
              }
            }
          }
        },
        {
          "range": {
            "birthDay": {
              "gt": "2000-01-01"
            }
          }
        }
      ]
    }
  }
}

This query only returns Jane, as she's read a book published in 1961, and she was born in 2001.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.