Facets/Aggregations and excluding filters


(Pawel) #1

We're currently evaluating elastic 1.7 vs solr 5. One of the use cases are Facets/Aggregations with filters. When filter is set for faceted/aggregated filter it should be excluded, so we won't get just one result.

In solr we're achieving it using filter tagging as follows:

select?qt=fieldsearch&q=*:*&start=0&rows=0
&fq={!tag=FILTER1}((field_1:("3098")))
&facet=true&facet.limit=-1&facet.mincount=1
&facet.field={!ex=FILTER1 key=FILTER1}field_1
&facet.field=field_2
&facet.field=field_3
&wt=json&indent=off`

To get similar results in elastic we're using following query:

{
    "size": 0,
    "aggs": {
        "field1": {
            "terms": {
                "field": "field1",
                "size": 0
            }
        },
        "facets_with_all_filters": {
            "filter": {
                "bool": {
                    "must": [
                        {
                            "term": {
                                "field1": "3102"
                            }
                        }
                    ]
                }
            },
            "aggs": {
                "field2": {
                    "terms": {
                        "field": "field2",
                        "size": 0
                    }
                },
                "field3": {
                    "terms": {
                        "field": "field3",
                        "size": 0
                    }
                }
            }
        }
    }
}

solr is performing much better in that case, is there better way to write such query in elastic or optimize it?


(Zachary Tong) #2

Apologies in advance, I'm not super familar with Solr's syntax. But I'm pretty sure those queries are asking for different results. The solr query is asking for:

  • All search hits which have field_1:"3098"
  • All terms in field_1 whose documents match FILTER2 + FILTER3
  • All terms in field_2 whose documents match FILTER1 + FILTER3
  • All terms in field_2 whose documents match FILTER1 + FILTER2

You have some syntax problems, but assuming the hierarchy means nesting, your Elasticsearch aggregation is asking for:

  • All search hits in the index (no filter)
  • All terms in field_1
  • All terms in field_2 whose documents match field_1:3102
    • For each term in the previous aggregation, generate all terms in field_3

A more comparable query would look something like this (annotated with comments):

{
  "size": 0,
  "query": {
    "filtered": {
      "query": { "match_all": {} },
      "filter": {
        "term": { "field_1": "3908" } // &fq={!tag=FILTER1}((field_1:("3098")))
      }
    }
  },
  "aggs": {

    // &facet.field={!ex=FILTER1 key=FILTER1_FACET}field_1
    "FILTER1_FACET": {
      "global": {},    // Global context, since the search is filtering FILTER1 and we don't want that
      "aggs": {
        "filter": {
          "bool": {
            "must": [
              { "term": { "<FILTER2 FIELD>": "<some value>" } },
              { "term": { "<FILTER3 FIELD>": "<some value>" } }
            ]
          }
        },
        "aggs": {
          "FILTER1_FACET_TERMS": {
            "terms": { "field": "field_1" }
          }
        }
      }
    },

    // &facet.field={!ex=FILTER2 key=FILTER2_FACET}field_2
    "FILTER2_FACET": {
      "filter": {  // Already includes FILTER1 from filtered query, so include FILTER3
        "term": { "<FILTER3 FIELD>": "<some value>" }  
      },
      "aggs": {
        "FILTER2_FACET_TERMS": {
          "term": {
            "field": "field_2"
          }
        }
      }
    },

    // &facet.field={!ex=FILTER3 key=FILTER3_FACET}field_2
    "FILTER3_FACET": {
      "filter": {  // Already includes FILTER1 from filtered query, so include FILTER2
        "term": { "<FILTER2 FIELD>": "<some value>" }
      },
      "aggs": {
        "FILTER3_FACET_TERMS": {
          "term": {
            "field": "field_2"
          }
        }
      }
    }
  }
}

Now, as far as performance, it's pretty hard to compare. ES is distributed by nature, while Solr has the benefit of being monolithic. If I understand correctly, Solr also includes a certain amount of aggressive caching that is invalidated when you index new documents, so comparisons can easily be skewed if you are just running searches.


(system) #3