Is the filter aggregation broken?

I'm getting a shard failure with Elasticsearch 2.3.3 due to a min aggregation being run on an inappropriate field however, the min aggregation is wrapped in a filter aggregation that I thought was supposed to avoid this.

Reproducible steps:

  1. create index1 containing type1:
POST http://elasticsearch:9200/index1
{"mappings" : {"type1" : {"properties" : {"field1" : {"type" : "string"}, "field2" : {"type" : "string"}}}}}
  1. create index2 containing type2:
POST http://elasticsearch:9200/index2
{"mappings" : {"type2" : {"properties" : {"field2" : {"type" : "date"}, "field3" : {"type" : "string"}}}}}
  1. index two documents:
POST http://elasticsearch:9200/_bulk
{"index" : {"_index": "index1", "_type" : "type1", "_id": "AVW13RC-KNxA0HG4GOkj"}}
{"field1": "value1", "field2": "value2"}
{"index" : {"_index": "index2", "_type" : "type2", "_id": "AVW13j8RKNxA0HG4GOkl"}}
{"field2": "2016-07-04", "field3": "value3"}
  1. perform the following search containing a filtered min aggregation on type2.field2:
POST http://elasticsearch:9200/index1,index2/_search
{
    "query" : {"match_all" : {}},
    "from" : 0, "size" : 0,
    "aggs" : {
        "type2_field2_min_filter" : {
            "filter" : {"type" : {"value" : "type2"}},
            "aggs" : {
                "type2_field2_min" : {
                    "min" : {"field" : "field2"}
                }
            }
        }
    }
}
  1. Elasticsearch returns an unexpected shard failure:
{
    "took" : 103,
    "timed_out" : false,
    "_shards" : {
        "total" : 2,
        "successful" : 1,
        "failed" : 1,
        "failures" : [
            {
                "shard" : 0, "index" : "index1", "node" : "qh4ZMdr2TUirveJ5hrpXuA",
                "reason" : {
                    "type" : "illegal_argument_exception",
                    "reason" : "Expected numeric type on field [field2], but got [string]"
                }
            }
        ]
    },
    "hits" : {"total" : 1, "max_score" : 0.0, "hits" : []},
    "aggregations" : {
        "type2_field2_min_filter" : {
            "doc_count" : 1,
            "type2_field2_min" : {
                "value" : 1.4675904E12,
                "value_as_string" : "2016-07-04T00:00:00.000Z"
            }
        }
    }
}

As you can see, the search is across both indices, however the filter aggregation is intended to limit the min aggregation to field2 of type2. Elasticsearch appears to run the min aggregation on field2 of type1 in index1 anyway, which results in a shard failure.

Is this an Elasticsearch bug or should I be adding logic to the calling code to ignore the shard failure in this case?

This is worse than I first thought: incorrectly running the type2.field2 aggregation on type1.field2 causes all aggregations on type1 to return zero results.

request:
{
    "query" : {"match_all" : {}},
    "from" : 0, "size" : 0,
    "aggs" : {
        "type1_field1_filter" : {
            "filter" : {"type" : {"value" : "type1"}},
            "aggs" : {
                "type1_field1_terms" : {
                    "terms" : {"field" : "field1"}
                }
            }
        },
        "type1_field2_filter" : {
            "filter" : {"type" : {"value" : "type1"}},
            "aggs" : {
                "type1_field2_terms" : {
                    "terms" : {"field" : "field2"}
                }
            }
        },
        "type2_field2_filter" : {
            "filter" : {"type" : {"value" : "type2"}},
            "aggs" : {
                "type2_field2_min" : {
                    "min" : {"field" : "field2"}
                }
            }
        },
        "type2_field3_filter" : {
            "filter" : {"type" : {"value" : "type2"}},
            "aggs" : {
                "type2_field3_terms" : {
                    "terms" : {"field" : "field3"}
                }
            }
        }
    }
}

response:
{
    "took" : 312,
    "timed_out" : false,
    "_shards" : {
        "total" : 10,
        "successful" : 5,
        "failed" : 5,
        "failures" : [
            {
                "shard" : 0, "index" : "index1", "node" : "ztZyVS7eRSqkVi85Dq8H1g",
                "reason" : {
                    "type" : "illegal_argument_exception",
                    "reason" : "Expected numeric type on field [field2], but got [string]"
                }
            }
        ]
    },
    "hits" : {"total" : 1, "max_score" : 0.0, "hits" : []},
    "aggregations" : {
        "type1_field1_filter" : {
            "doc_count" : 0,
            "type1_field1_terms" : {
                "doc_count_error_upper_bound" : 0,
                "sum_other_doc_count" : 0,
                "buckets" : []
            }
        },
        "type1_field2_filter" : {
            "doc_count" : 0,
            "type1_field2_terms" : {
                "doc_count_error_upper_bound" : 0,
                "sum_other_doc_count" : 0,
                "buckets" : []
            }
        },
        "type2_field2_filter" : {
            "doc_count" : 1,
            "type2_field2_min" : {
                "value" : 1.4675904E12,
                "value_as_string" : "2016-07-04T00:00:00.000Z"
            }
        },
        "type2_field3_filter" : {
            "doc_count" : 1,
            "type2_field3_terms" : {
                "doc_count_error_upper_bound" : 0,
                "sum_other_doc_count" : 0,
                "buckets" : [
                    {"key" : "value3", "doc_count" : 1}
                ]
            }
        }
    }
}

Is there any work-a-round for this?

Splitting the search request into per index requests simply isn't feasible once there's tens or hundreds of types and neither is enforcing that field names must be unique across types in separate indices.

Hi @chris16,

I reformatted your posts a bit so the indentation in the json examples is easier to read when using the browser. Regarding your first question: I'm not exactly sure if the error you are seeing with the filter aggregation is expected or a bug.

Can you work around this by filtering for the type in the query part of the request, e.g. using a type query like

"query" : {
        "type" : {
            "value" : "type2"
        }
}

or does this fall under the "not feasible" category for you?

Sorry, I have to correct my last post, using the type query still produces the failure. The reason seems to be a known limitation in how aggrations get initialized. Even if we never run the min aggregation on any document from index1, elasticsearch tries to load the mapping for that field on every shard even before running the aggregation. On the upside, the result in your first example seems correct, so ignoring the failure in this case seems like the best workaround to me.

You're right ignoring the shard failure in the simple case is the way to go; it's the other aggregations returning zero results in the more realistic scenario that's the show stopper.

Thanks for fixing the formatting of the JSON :slight_smile: