Need to return part of a doc from a search query/filter; is parent-child the way to go?


#1

I have a set of docs (let's call them books), which have a subset of information (let's say editions), for a data structure somewhat akin to this:

"book": {
    "author": "A. N. Author",
    "title": "Fantastic Queries and How to Index Them"
    "editions": [
        {
          "publisher":"penguin",
          "isbn": 124161256653,
          "coverArtist":"Pain T Brush",
          "amazonPrice":65.50
        },
        {
          "publisher":"orbit",
          "isbn": 124163526653,
          "coverArtist":"Pain T Brush",
          "amazonPrice":25.99
        },
        {
          "publisher":"tor",
          "isbn": 124169876353,
          "coverArtist":"Pen See Il",
          "amazonPrice":700.00
        }
    ]
}

Right now with the queries I have (which search on editions.publisher or editions.isbn), I get the whole document back, including editions that don't match the query results (I believe this is what's known as flattened/normalized data structure?). So, to be clear, if I search coverArtist for Pain T Brush, the data I want returned is:

"book": {
    "author": "A. N. Author",
    "title": "Fantastic Queries and How to Index Them"
    "editions": [
        {
          "publisher":"penguin",
          "isbn": 124161256653,
          "coverArtist":"Pain T Brush",
          "amazonPrice":65.50
        },
        {
          "publisher":"orbit",
          "isbn": 124163526653,
          "coverArtist":"Pain T Brush",
          "amazonPrice":25.99
        }
    ]
}

Same with the other queries. If I search for a specific isbn, I only want its data to come back. If I set a price range of > 500, I'd only want the last edition (along with the author and title information, of course). And finally, if I search for Fantastic Queries and How to Index Them in the title, I want the whole doc returned, with all edition information, since I didn't specify anything edition-specific.

I was made aware of child and parent docs, in which each edition would be a child of one parent book. However, querying this system returns either a child or a parent, not both, which is what I need. Granted, if I get a child, its _parent property can be used to GET the parent's data, but that seems inefficient (i.e. I have to get two separate docs with two separate GET requests; first the child result, then its parent).

So really, I wanted to double check here that I'm not missing some obvious solution. Is this parent-child doc structure the only way to achieve what I need, or is there something else I can use?


(Xavier Facq) #2

Hi,

Keeping exactly the same structure and what you want to do in the search, you can use the Nested functionnality.

https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-nested-query.html

Add in your mapping, for the field "editions"

"type" : "nested"

It's a little bit "harder" to understand, but you will have documents (book) matching your query AND, nested sub-elements (editions) matching the query.

Bye,
Xavier


(David Pilato) #3

Yes. Use nested docs and have a look at https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-inner-hits.html#nested-inner-hits


#4

To clarify, right now my mapping for editions looks something like this:

editions: {
        properties: {
          coverArtist: stringWithRaw,
          isbn: {type: "integer"},
          publisher: stringWithRaw,
          amazonPrice: {type: "double"},
        }
    }

where stringWithRaw is simply

let stringWithRaw = {
      type:"string",
      fields:{
        "raw":{type:"string", index:"not_analyzed"}
      }
    };

So all I have to do is to add type: "nested" above where I declare the properties of editions?


(Xavier Facq) #5

Yes like that :

editions: {
        "type": "nested",
        properties: {
            coverArtist: stringWithRaw,
            isbn: {type: "integer"},
            publisher: stringWithRaw,
            amazonPrice: {type: "double"},
        }
    }

#6

My initial testing is showing that I still get the editions that don't match the query... this is the query I'm testing with:

GET books/_search
{
    "query": {
        "nested" : {
            "path" : "editions",
            "query" : {
                "bool" : {
                    "must" : [
                        { "match" : {"editions.coverArtist" : "Pain T Brush"} }
                    ]
                }
            }
        }
    }
}

But I still get resulting book objects that contain edition objects with the other artist (Pen See Il) in the hits results.

EDIT: I should clarify; in the hits array that is returned, I want the _source that's returned to have its editions array trimmed/modified/filtered to only include the editions that match the query. Not sure if the nesting is capable of accomplishing this; not sure if it can modify the returned _source.

I have a followup question; can I do partial word matching in nested queries? The field that I'm using nested queries on now used to be using QueryString, which I'm guessing is not going to work with nested. However, I'd really like to have partial word matching (i.e. returning all docs with coverArtist: "Pain T Br" should return the same as when searching with full name).


(Xavier Facq) #7

You should have your documents in the "hits" array with all editions attached, that's the main documents array, returned fulfilled. In a second part of the results you should find the "inner_hits" arrays that contains ONLY nested documents matching your query.

For your new question, it depends on what you want to allow as query, a good start point is to add this param :

"minimum_should_match" : "75%" 

@see https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-minimum-should-match.html


#8

I see now, all the relevant info is stored in inner_hits. It's a bit annoying that it's separate from the usual _source data I use, but I can work with that.

What I allow is query is partial word searches, since I have a simple text input field that searches as people type. Can I not use some type of prefix matching? For example, I've used multi_match with type: "phrase_prefix" before, will that work with nested docs (or something like match_phrase_prefix)?


(Xavier Facq) #9

That's good ! You can mark this post as resolved and open a new one with the query you want to do.

Note that: all queries as possible within the nested. Boolean, multi_match, etc... no limit !


(system) #10