Scrolling through Terms aggregation?

cherry-wave · May 4, 2017, 9:14am

Hi!

I use ES for a project where you have a telephone book but users can alter the original entries for their group or just themselves. The user can search the telephone book for entries but only wants to see the entries that are valid for him in his group context. So, my search looks like this:

GET etb/_search?pretty
{
   "query": {
      "bool": {
        "must": [
          {
            "match_phrase_prefix" : {
              "_all" : "Max"
            }
          },
          {
            "bool": {
              "should": [
                {
                  "bool": {
                    "must": [
                      {
                        "match": {
                          "ownerType": "USER"
                        }
                      },
                      {
                        "match": {
                          "owner": 1
                        }
                      }
                    ]
                  }
                },
                {
                  "bool": {
                    "must": [
                      {
                        "match": {
                          "ownerType": "GROUP"
                        }
                      },
                      {
                        "bool": {
                          "should": [
                            {
                              "match": {
                                "owner": 5
                              }
                            },
                            {
                              "match": {
                                "owner": 3
                              }
                            },
                            {
                              "match": {
                                "owner": 1
                              }
                            }
                          ]
                        }
                      }
                    ]
                  }
                }
              ]
            }
          }
        ]
      }
    },
    "aggs" : {
        "distinctIds" : {
            "terms" : {
              "field" : "id.keyword",
              "size": 2147483647
            }
        }
    },
    "size": 0
}

I know it's a little monster ;D But what I want to achieve here is, of course do a kind of "wildcard search" over all fields for the name "Max", and to make sure the entries should at least contain either my user id as an owner or one of my group ids. But I don't want the actual entries, just the distinct ids so I can afterwards join the entries.

The problem here is that I only get 126 or 138 or 141 buckets although I know there must be 10.315 matches. And what would happen if the matches will be > Integer.MAX_VALUE?

Is this even the right way to get what I want?

Thanks in advance =)

cherry-wave · May 9, 2017, 8:43am

Can't really nobody help with this? =(

jpountz · May 9, 2017, 9:05am

If the expected number of unique ids is low, then your current approach is fine. But otherwise if it has a high cardinality, I would keep the same query, remove the aggregation, scroll through all the results using the _scroll API and collect matching ids on client side.

cherry-wave · May 9, 2017, 1:42pm

OK so you think the problem here is the mass of returns. If I have a "refined" search with much less hits it would be more reliable?

I don't think doing scroll and collect on client side would be nice because the "page"-sizes would vary too much and it could be that the same id pops up in 2 scrolls because there are too many entries with the same id. I hope this explains what I mean:

Guess we have a scroll size of 5 entries.

1 1 2 2 3 | 3 3 4 4 4

But I could return all distinct ids on server side that would be OK if it's possible to exclude all fields except the id field to have less traffic?

system · June 6, 2017, 1:46pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Bool Query giving inappropriate results Elasticsearch	5	344	April 9, 2020
Retrieve information about matched words before aggregation Elasticsearch	2	444	May 19, 2017
Terms query with the operator AND Elasticsearch	2	14431	July 6, 2017
Search exact keyword from ES Elasticsearch	4	384	April 20, 2020
How to filter the following query? Elasticsearch	2	387	March 7, 2020

Scrolling through Terms aggregation?

Related topics