Obtaining a list of filtered document properties

alcalin · June 26, 2020, 10:02am

Hello,

I have the following index mappings:

{
    "mappings": {
        "properties": {
            "library": {
                "type": "keyword"
            },
            "books": {
                "properties": {
                    "name": {
                        "type": "keyword"
                    },
                    "id": {
                        "type": "keyword"
                    }
                }
            }
        }
    }
}

The following indexed documents:

{"index":{"_index":"library","_id":"1"}}
{     "library": "amsterdam",     "books": [{"id":"1", "name": "Book1" }, {"id":"2", "name": "Book2"}, { "id":"3", "name": "Book3" }] }
{"index":{"_index":"library","_id":"2"}}
{     "library": "paris",     "books": [{"id":"4", "name": "Book4" }, {"id":"2", "name": "Book2"}, { "id":"5", "name": "Book5" }] }
{"index":{"_index":"library","_id":"3"}}
{     "library": "berlin",     "books": [{"id":"5", "name": "Book5" }, {"id":"2", "name": "Book2"}, { "id":"6", "name": "Book6" }] }

My goal is to create a multi field search like they have in https://youtrack.jetbrains.com/issues for example, but in a much more simplified version.
My example focuses on a 'library' that can have multiple 'books' and each book can exist in multiple libraries at the same time. For the purpose of providing 'as-you-type' book name suggestions I have attempted to use a term aggregation combined with a filter aggregation as follows:

{
    "size": 0,
    "aggs": {
        "filtered_books": {
            "filter": {
                "term": {
                    "books.name": "Book6"
                }
            },
            "aggs": {
                "books": {
                    "terms": {
                        "field": "books.name"
                    }
                }
            }
        }
    }
}

With the following results:

"filtered_books": {
    "doc_count": 1,
    "books": {
        "doc_count_error_upper_bound": 0,
        "sum_other_doc_count": 0,
        "buckets": [
            {
                "key": "Book2",
                "doc_count": 1
            },
            {
                "key": "Book5",
                "doc_count": 1
            },
            {
                "key": "Book6",
                "doc_count": 1
            }
        ]
    }
}

From my understanding of the above filtering based on "Book6" will give document with name"Berlin", this document has other 2 tags so in the end performing the aggregation on it will result in 3 books . I want to have just "Book6" in the final aggregation.

As I've mentioned, my goal is to have a filtered list of 'books' based on their name to be provided in multi-field search suggestions.
I'm not sure this is the correct approach to achieving that.
ES 7.6.2

Thank you

alcalin · June 27, 2020, 10:34am

So apparently I cannot find an edit button for my own post. I will reply to it instead ( maybe it is intended, don't know since it is my first post here).
I wanted to say that I begin to suspect that the approach that I have taken might not be the right one, and for the sake of building a 'search query suggester' I might need to add the aggregations to a separate index and perform regular filtering/search on that index. The downside that I see in that is the fact that I would have to perform another aggregation each time a new library is being added and add the results of that aggregation to the second index. I am curious what does the elasticsearch authority has to say about this approach?

Vinayak_Sapre · June 27, 2020, 11:42pm

Hi Alexandru,
Welcome to the community.

In your mapping you have 3 fields, library, book name and book id. When you say "multi-field search" are you planning to match both library and book name? Or you are looking for a book-name across all libraries.

Have you considered flattening structure by adding library field on each book? Instead of library as a document you will have book as a document. This will be better for adding a new book to any library.

So instead of

"index":{"_index":"library","_id":"1"}}
{"library": "amsterdam", "books": [{"id":"1", "name": "Book1" }, {"id":"2", "name": "Book2"}, { "id":"3", "name": "Book3" }] }

Use

{"index":{"_index":"library"}}
{"id":"1", "name": "Book1", "library": "amsterdam" }
{"index":{"_index":"library"}}
{"id":"2", "name": "Book2", "library": "amsterdam" } 
{"index":{"_index":"library"}}
{"id":"3", "name": "Book3", "library": "amsterdam"}

Id will be library specific and can be duplicate. This is fine since we are not using it as the document id (which is _id).

alcalin · June 28, 2020, 12:32pm

Hi Vinayak,
Thank you for your answer. I am planning on matching both library and book name.

Flattening the structure would work for this simplified document model but the model that I have has more properties and all are searchable.
Bellow is the full document model:

library (name)
construction date
number of rooms
books: [Book1, Book2, Book3 etc.]
bookGenres: [Historycal,Romance, Sci-Fi, Biographies etc.]

All of the fields are searchable and I wish to provide query suggestions for all except 'library name'.

Vinayak_Sapre · June 28, 2020, 7:55pm

Hi Alexandru,

Welcome to the community.
For editing your own post, if you are logged in, you will see a pencil icon at the bottom of your post, on the left of "Reply".

Back to your question.
You want to search on both library attributes and book attributes. Generally matching multiple fields is implemented using copy_to https://www.elastic.co/guide/en/elasticsearch/reference/current/copy-to.html

I am curious what results are you expecting in each of these cases

search string matches only a library name
search string matches only books in one or more libraries
search string matches a book and a library that has this book
search string matches a book and a library that does not have this book

alcalin · June 30, 2020, 9:02am

Hello Vinayak,

I have no pencil icon there:
asd

Copy_to will not work for me because if I do that I am going to loose the option to boost or apply a specific type of query for an individual field. A simple boolean should query composed of an array of other subqueries for each individual field is what I have.

Back to your questions

As I've mentioned earlier searching for results or providing query completition suggestions is done on each individual field, with that in mind:

If I am providing search query suggestions for books:

1. search string matches only a library name
Nothing. But the match will not happen because I will only search on the books field.

2. search string matches only books in one or more libraries
List of unique book names that are matched.

3. search string matches a book and a library that has this book
Only the book.

4. search string matches a book and a library that does not have this book
Only the book.

But I think a solution to my question is:

creating facets from the books
storing the facets in a separate index
performing a search on the new facets index

The inspiration comes from the way Algolia describes creating query suggestions in the following page: https://www.algolia.com/doc/guides/getting-insights-and-analytics/leveraging-analytics-data/query-suggestions/how-to/implementing-query-suggestions/#adding-queries-by-facets

The only downside is that I have to maintain another index.
I wonder how heavy is the operation of computing the facets each time a new library is addded or edited and if this is a recommended approach.

Thanks

system · July 28, 2020, 9:02am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elastic aggregation search result filters Elasticsearch	1	334	November 28, 2018
Elastic Search query on multiple fields in terms lookup Elasticsearch	2	1283	September 15, 2020
Multi Fields terms facets Elasticsearch	4	262	July 6, 2017
Filter array Elasticsearch	9	760	March 6, 2017
Elasticsearch filter multiple fields Elasticsearch	6	11238	July 6, 2017

Obtaining a list of filtered document properties

Related topics