Counting matching parent documents by aggregations on child documents


(Andris Priedīte) #1

I have two document types:

  1. ads (title, url and similar fields)
  2. ad_views (time, user_id, gender, age, country and similar fields). ad_views document type has ads as parent.

The wanted end result is search form where user can find ads by choosing ad_views parameters. For example, find all ads who have been viewed by male, aged 18-25, living in UK. I also want to display "faceted search" form (like in amazon.com and many other sites) where you can see number of items (ads) under each filter/search option.

The searching (getting filtered list of docs) part is clear, but not the aggregations. I'm trying to figure out how to use aggregations on the child document properties, but instead of getting child document counts I would like to get counts for ads documents.

For example if I have data like this:

ad: "free beer in UK"
    ad view: male
ad: "free beer in Scotland"
    ad view: male
    ad view: male
    ad view: female
ad: "free beer in Wales"
    ad view: male
    ad view: male

And query like this:

GET /test/ads/_search
{
   "aggs": {
        "genders": {
            "children": {
                "type": "ad_views"
            },
            "aggs": {
                "gender": {
                    "terms": {
                        "field": "gender"
                    }
                }
            }
        }
    },
    "query" : {
        "term" : { 
            "title" : "free beer" 
        }
    }
}

The search will return me three ads. And in aggregations I will get male=5 and female=1 (that is - it will return total term counts in sub documents), but what I want is to get ads counts, that is, male=3, female=1 (there are three ads documents who have one or more male ad view, and one ads document who have one or more female ad view).

How can I achieve this?


(Andris Priedīte) #2

Please let me know if the description is unclear, I really need to find a solution for this :slight_smile:


(Masaru Hasegawa) #3

That would require parent aggregation, which is yet to be implemented [1]

If the number of values in child documents is limited, you could use filter aggregation with has_child filter as a workaround. In your particular case, two filter aggregations, one for male another for female.

Masaru

[1] https://github.com/elastic/elasticsearch/issues/5306


(Andris Priedīte) #4

Thanks, Masaru, will try your proposed workaround on limited value fields.
Any ideas on workaround on not-so-limited value fields? For example, if there would be a "city" field with potentially hundreds of values?


(Andris Priedīte) #5

I'm looking into nested / reverse nested aggregations (I think it will not be a big problem for me to switch to nested docs instead of parents/childs), but not sure exactly if these aggregations will do what I want. Having a bit hard time to grasp the concepts.


(Andris Priedīte) #6

Moved ad_views as nested documents instead of children.

Now I'm trying to count ads per gender, first by filtering on nested docs(title and date of birth fields).

Added reverse_nested to get this doc count. The problem is that "gender_count" aggregation on the nested documents "ad_views.account.gender" returns me TOTAL number of the documents who has a ad_views with this gender. It ignores date of birth in the nested documents.

See aggregation example below. Any ideas on how to do this?

"genders": {
    "filter": {                        
        "and": [
            {"term": {"title": "free beer"}},
            {
                "nested": {
                    "path": "ad_views",
                    "filter": {
                        "and": [
                            {"range": {"ad_views.dob": {"gte": "1980-09-15", "lt": "1990-09-15"}}}
                        ]
                    }
                }
            }
        ]
    },       
    "aggs": {
        "genders1": {
            "nested": {
                "path" : "ad_views"
            },
            "aggs": {
                "gender_count": {
                    "terms": {
                        "field": "ad_views.gender"
                    },
                    "aggs": {
                        "back_to_ads": {
                            "reverse_nested": {}
                        }
                    }
                }
            }
        }
    }
}

(system) #7