Facets/Aggregations and excluding filters

We're currently evaluating elastic 1.7 vs solr 5. One of the use cases are Facets/Aggregations with filters. When filter is set for faceted/aggregated filter it should be excluded, so we won't get just one result.

In solr we're achieving it using filter tagging as follows:

select?qt=fieldsearch&q=*:*&start=0&rows=0
&fq={!tag=FILTER1}((field_1:("3098")))
&facet=true&facet.limit=-1&facet.mincount=1
&facet.field={!ex=FILTER1 key=FILTER1}field_1
&facet.field=field_2
&facet.field=field_3
&wt=json&indent=off`

To get similar results in elastic we're using following query:

{
    "size": 0,
    "aggs": {
        "field1": {
            "terms": {
                "field": "field1",
                "size": 0
            }
        },
        "facets_with_all_filters": {
            "filter": {
                "bool": {
                    "must": [
                        {
                            "term": {
                                "field1": "3102"
                            }
                        }
                    ]
                }
            },
            "aggs": {
                "field2": {
                    "terms": {
                        "field": "field2",
                        "size": 0
                    }
                },
                "field3": {
                    "terms": {
                        "field": "field3",
                        "size": 0
                    }
                }
            }
        }
    }
}

solr is performing much better in that case, is there better way to write such query in elastic or optimize it?

Apologies in advance, I'm not super familar with Solr's syntax. But I'm pretty sure those queries are asking for different results. The solr query is asking for:

  • All search hits which have field_1:"3098"
  • All terms in field_1 whose documents match FILTER2 + FILTER3
  • All terms in field_2 whose documents match FILTER1 + FILTER3
  • All terms in field_2 whose documents match FILTER1 + FILTER2

You have some syntax problems, but assuming the hierarchy means nesting, your Elasticsearch aggregation is asking for:

  • All search hits in the index (no filter)
  • All terms in field_1
  • All terms in field_2 whose documents match field_1:3102
    • For each term in the previous aggregation, generate all terms in field_3

A more comparable query would look something like this (annotated with comments):

{
  "size": 0,
  "query": {
    "filtered": {
      "query": { "match_all": {} },
      "filter": {
        "term": { "field_1": "3908" } // &fq={!tag=FILTER1}((field_1:("3098")))
      }
    }
  },
  "aggs": {

    // &facet.field={!ex=FILTER1 key=FILTER1_FACET}field_1
    "FILTER1_FACET": {
      "global": {},    // Global context, since the search is filtering FILTER1 and we don't want that
      "aggs": {
        "filter": {
          "bool": {
            "must": [
              { "term": { "<FILTER2 FIELD>": "<some value>" } },
              { "term": { "<FILTER3 FIELD>": "<some value>" } }
            ]
          }
        },
        "aggs": {
          "FILTER1_FACET_TERMS": {
            "terms": { "field": "field_1" }
          }
        }
      }
    },

    // &facet.field={!ex=FILTER2 key=FILTER2_FACET}field_2
    "FILTER2_FACET": {
      "filter": {  // Already includes FILTER1 from filtered query, so include FILTER3
        "term": { "<FILTER3 FIELD>": "<some value>" }  
      },
      "aggs": {
        "FILTER2_FACET_TERMS": {
          "term": {
            "field": "field_2"
          }
        }
      }
    },

    // &facet.field={!ex=FILTER3 key=FILTER3_FACET}field_2
    "FILTER3_FACET": {
      "filter": {  // Already includes FILTER1 from filtered query, so include FILTER2
        "term": { "<FILTER2 FIELD>": "<some value>" }
      },
      "aggs": {
        "FILTER3_FACET_TERMS": {
          "term": {
            "field": "field_2"
          }
        }
      }
    }
  }
}

Now, as far as performance, it's pretty hard to compare. ES is distributed by nature, while Solr has the benefit of being monolithic. If I understand correctly, Solr also includes a certain amount of aggressive caching that is invalidated when you index new documents, so comparisons can easily be skewed if you are just running searches.