_field_names aggregation with sub-aggregation


#1

After using terms on _field_names, I want to sub-aggregation on each field. For each field, I want to know if it contained a non-null value (e.g. using missing, or exists). I do not know how to tell the sub-aggregation to aggregated on itself. I need a way to do this since I didn't specify any fields.

This is what I have. It's wrong.

GET _search
{   "size": 0,
    "query": {
        "term": {"property": "TGC"}
    },
    "aggs": {
        "try": {
            "terms": {
                "field": "_field_names",
                "size": 0
            },
            "aggs": {
                "a_counting_sub_agg": {
                    "missing" : { "field" : "_field_names" }
                }
            }
        }
    }
}

Here are examples of what I want

Ex.A

GET _search
{   "size": 0,
    "query": {
        "term": {"property": "TGC"}
    },
    "aggs": {
        "try": {
            "terms": {
                "field": "_field_names",
                "size": 0
            },
            "aggs": {
                "a_counting_sub_agg": {
                    "missing" : { "field" : "the_current_field.value" }
                }
            }
        }
    }
}

Ex.B

GET _search
{   "size": 0,
    "query": {
        "term": {"property": "TGC"}
    },
    "aggs": {
        "try": {
            "terms": {
                "field": "_field_names",
                "size": 0
            },
            "aggs": {
                "a_counting_sub_agg": {
                    "term" : { "some_property_every_field_has" : "that_property.value" }
                }
            }
        }
    }
}

My desired output would resemble the following
// used to include some comments

"aggregations": {
   "try": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0, // should this change??
      "buckets": [
         {
            "key": "bookingDate",
            "doc_count": 7142,
            "a_counting_sub_agg": {
               "doc_count": 10 // As in, 10 records had a "missing" bookingDate
            }
         },
         {
            "key": "checkinDate",
            "doc_count": 7142,
            "a_counting_sub_agg": {
               "doc_count": 0
            }
         },
         {
            "key": "checkoutDate",
            "doc_count": 7142,
            "a_counting_sub_agg": {
               "doc_count": 20 // Means 20 docs had no value for checkoutDate.
            }
         },
         etc...

I feel like there is some special keyword I don't know to make this happen. How do I pay attention to the current field, and whether or not is has a value of some sort? For every field in my index, I'm trying to see the count of how many had/didn't have values.


(Colin Goodheart-Smithe) #2

Unfortunately you cannot current access the key of the bucket from a sub-aggregation so what you want to do it not currently possible in a single request.

You could achieve this in two requests by using the terms aggregation on the _field_names field in a first request, and then sending a second request that has a missing aggregation for each field name returned from the first request.


#3

Thanks for your feedback. I think I found the solution. The sub-agg may not be needed.


(system) #4