Advice on query against nested objects

We are using ElasticSearch 6.8, and performing searches against an index of "locations" that have nested "offers" within them. A location can have many nested offers and offers can be nested within many locations. Locations are unique. Tags present on both the locations and offers will determine if given locations or their nested offers are to be returned in the results.

Here is our cut down mapping with only the relevant fields:

 {
      "mapping": {
        "_doc": {
          "dynamic": "false",
          "properties": {
            "location": {
              "type": "geo_point"
            }
            "location_subscription_tag_values": {
              "type": "keyword"
            }
            "marketer_id": {
              "type": "keyword"
            },
            "name": {
              "type": "text"
            },
            "offers": {
              "type": "nested",
              "properties": {
    			"id": {
                  "type": "integer"
                },
                "created_at": {
                  "type": "date"
                },
                "user_subscription_tag_values": {
                  "type": "keyword"
                },
                "user_targeting_tag_values": {
                  "type": "keyword"
                }
              }
            }
          }
        }
      }
    }

For our original use case, this is working well, we can query and return a geo-sorted list of locations and their one or more offers that match the tags in the query. That query:

{
    "query": {
        "bool": {
            "must": [{
                    "term": {
                        "marketer_id": 15
                    }
                }, {
                    "nested": {
                        "inner_hits": {},
                        "path": "offers",
                        "query": {
                            "bool": {
                                "must": [{
                                        "bool": {
                                            "must_not": [{
                                                    "exists": {
                                                        "field": "offers.user_targeting_tag_values"
                                                    }
                                                }
                                            ]
                                        }
                                    }, {
                                        "bool": {
                                            "must": [{
                                                    "terms": {
                                                        "offers.user_subscription_tag_values": ["singlecity"]
                                                    }
                                                }
                                            ]
                                        }
                                    }
                                ]
                            }
                        }
                    }
                }, {
                    "bool": {
                        "must": [{
                                "terms": {
                                    "location_subscription_tag_values": ["bali", "sydney"]
                                }
                            }
                        ]
                    }
                }
            ]
        }
    },
    "_source": true,
    "sort": [{
            "_geo_distance": {
                "location": {
                    "lat": 30.4352131,
                    "lon": 72.3570404
                },
                "order": "asc",
                "unit": "km",
                "mode": "min",
                "distance_type": "arc",
                "ignore_unmapped": true
            }
        }, "_score"]
}

However, now we need a pull a different collection of results and are struggling to do so. The new use case is, we want to display a list of locations that have the newest offers by date on the offer. Additionally, the closest location and only the closest location having that newest nested offer should be shown. Then the second result should be the second newest offer and it's closest location.

This is as close as we've been able to get it. It orders the locations by the date on the nested offers, which is correct. And also orders the nested offers in the inner_hits so by taking the first one we get the newest. However, it will repeat the same offer at different locations when the offer is present in more than one, which we don't want. We only want the single closest location that has the newest offer, before moving on to the next newest offer, rather than showing the same offer at 100+ locations before moving onto the next. We looked at "Collapsing" results on offer.id but Collapse does not appear to work on nested objects.

{
    "query": {
        "bool": {
            "must": [{
                    "term": {
                        "marketer_id": 15
                    }
                }, {
                    "nested": {
                        "inner_hits": {
							"sort": [{
                                    "offers.created_at": {
                                        "order": "desc"
                                    }
                                }
                            ]},
                        "path": "offers",
                        "query": {
                            "bool": {
                                "must": [{
                                        "bool": {
                                            "must_not": [{
                                                    "exists": {
                                                        "field": "offers.user_targeting_tag_values"
                                                    }
                                                }
                                            ]
                                        }
                                    }, {
                                        "bool": {
                                            "must": [{
                                                    "terms": {
                                                        "offers.user_subscription_tag_values": ["singlecity"]
                                                    }
                                                }
                                            ]
                                        }
                                    }
                                ]
                            }
                        }
                    }
                }, {
                    "bool": {
                        "must": [{
                                "terms": {
                                    "location_subscription_tag_values": ["bali", "sydney"]
                                }
                            }
                        ]
                    }
                }
            ]
        }
    },
    "_source": true
}

So the ask is, is what we're trying to do even possible against our existing index structure as we have not been able to get there? Or do we need to look at changing the index structure. Any thoughts are appreciated.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.