Obtaining a list of filtered document properties

Hello,

I have the following index mappings:

{
    "mappings": {
        "properties": {
            "library": {
                "type": "keyword"
            },
            "books": {
                "properties": {
                    "name": {
                        "type": "keyword"
                    },
                    "id": {
                        "type": "keyword"
                    }
                }
            }
        }
    }
}

The following indexed documents:

{"index":{"_index":"library","_id":"1"}}
{     "library": "amsterdam",     "books": [{"id":"1", "name": "Book1" }, {"id":"2", "name": "Book2"}, { "id":"3", "name": "Book3" }] }
{"index":{"_index":"library","_id":"2"}}
{     "library": "paris",     "books": [{"id":"4", "name": "Book4" }, {"id":"2", "name": "Book2"}, { "id":"5", "name": "Book5" }] }
{"index":{"_index":"library","_id":"3"}}
{     "library": "berlin",     "books": [{"id":"5", "name": "Book5" }, {"id":"2", "name": "Book2"}, { "id":"6", "name": "Book6" }] }

My goal is to create a multi field search like they have in https://youtrack.jetbrains.com/issues for example, but in a much more simplified version.
My example focuses on a 'library' that can have multiple 'books' and each book can exist in multiple libraries at the same time. For the purpose of providing 'as-you-type' book name suggestions I have attempted to use a term aggregation combined with a filter aggregation as follows:

{
    "size": 0,
    "aggs": {
        "filtered_books": {
            "filter": {
                "term": {
                    "books.name": "Book6"
                }
            },
            "aggs": {
                "books": {
                    "terms": {
                        "field": "books.name"
                    }
                }
            }
        }
    }
}

With the following results:

"filtered_books": {
    "doc_count": 1,
    "books": {
        "doc_count_error_upper_bound": 0,
        "sum_other_doc_count": 0,
        "buckets": [
            {
                "key": "Book2",
                "doc_count": 1
            },
            {
                "key": "Book5",
                "doc_count": 1
            },
            {
                "key": "Book6",
                "doc_count": 1
            }
        ]
    }
}

From my understanding of the above filtering based on "Book6" will give document with name"Berlin", this document has other 2 tags so in the end performing the aggregation on it will result in 3 books . I want to have just "Book6" in the final aggregation.

As I've mentioned, my goal is to have a filtered list of 'books' based on their name to be provided in multi-field search suggestions.
I'm not sure this is the correct approach to achieving that.
ES 7.6.2

Thank you

So apparently I cannot find an edit button for my own post. I will reply to it instead ( maybe it is intended, don't know since it is my first post here).
I wanted to say that I begin to suspect that the approach that I have taken might not be the right one, and for the sake of building a 'search query suggester' I might need to add the aggregations to a separate index and perform regular filtering/search on that index. The downside that I see in that is the fact that I would have to perform another aggregation each time a new library is being added and add the results of that aggregation to the second index. I am curious what does the elasticsearch authority has to say about this approach?

Hi Alexandru,
Welcome to the community.

In your mapping you have 3 fields, library, book name and book id. When you say "multi-field search" are you planning to match both library and book name? Or you are looking for a book-name across all libraries.

Have you considered flattening structure by adding library field on each book? Instead of library as a document you will have book as a document. This will be better for adding a new book to any library.

So instead of

"index":{"_index":"library","_id":"1"}}
{"library": "amsterdam", "books": [{"id":"1", "name": "Book1" }, {"id":"2", "name": "Book2"}, { "id":"3", "name": "Book3" }] }

Use

{"index":{"_index":"library"}}
{"id":"1", "name": "Book1", "library": "amsterdam" }
{"index":{"_index":"library"}}
{"id":"2", "name": "Book2", "library": "amsterdam" } 
{"index":{"_index":"library"}}
{"id":"3", "name": "Book3", "library": "amsterdam"}

Id will be library specific and can be duplicate. This is fine since we are not using it as the document id (which is _id).

Hi Vinayak,
Thank you for your answer. I am planning on matching both library and book name.

Flattening the structure would work for this simplified document model but the model that I have has more properties and all are searchable.
Bellow is the full document model:

  1. library (name)
  2. construction date
  3. number of rooms
  4. books: [Book1, Book2, Book3 etc.]
  5. bookGenres: [Historycal,Romance, Sci-Fi, Biographies etc.]

All of the fields are searchable and I wish to provide query suggestions for all except 'library name'.

Hi Alexandru,

Welcome to the community.
For editing your own post, if you are logged in, you will see a pencil icon at the bottom of your post, on the left of "Reply".

Back to your question.
You want to search on both library attributes and book attributes. Generally matching multiple fields is implemented using copy_to https://www.elastic.co/guide/en/elasticsearch/reference/current/copy-to.html

I am curious what results are you expecting in each of these cases

  1. search string matches only a library name
  2. search string matches only books in one or more libraries
  3. search string matches a book and a library that has this book
  4. search string matches a book and a library that does not have this book

Hello Vinayak,

I have no pencil icon there:
asd

Copy_to will not work for me because if I do that I am going to loose the option to boost or apply a specific type of query for an individual field. A simple boolean should query composed of an array of other subqueries for each individual field is what I have.

Back to your questions

As I've mentioned earlier searching for results or providing query completition suggestions is done on each individual field, with that in mind:

If I am providing search query suggestions for books:

1. search string matches only a library name
Nothing. But the match will not happen because I will only search on the books field.

2. search string matches only books in one or more libraries
List of unique book names that are matched.

3. search string matches a book and a library that has this book
Only the book.

4. search string matches a book and a library that does not have this book
Only the book.

But I think a solution to my question is:

  • creating facets from the books
  • storing the facets in a separate index
  • performing a search on the new facets index

The inspiration comes from the way Algolia describes creating query suggestions in the following page: https://www.algolia.com/doc/guides/getting-insights-and-analytics/leveraging-analytics-data/query-suggestions/how-to/implementing-query-suggestions/#adding-queries-by-facets

The only downside is that I have to maintain another index.
I wonder how heavy is the operation of computing the facets each time a new library is addded or edited and if this is a recommended approach.

Thanks