Scrolling through Terms aggregation?


#1

Hi!

I use ES for a project where you have a telephone book but users can alter the original entries for their group or just themselves. The user can search the telephone book for entries but only wants to see the entries that are valid for him in his group context. So, my search looks like this:

GET etb/_search?pretty
{
   "query": {
      "bool": {
        "must": [
          {
            "match_phrase_prefix" : {
              "_all" : "Max"
            }
          },
          {
            "bool": {
              "should": [
                {
                  "bool": {
                    "must": [
                      {
                        "match": {
                          "ownerType": "USER"
                        }
                      },
                      {
                        "match": {
                          "owner": 1
                        }
                      }
                    ]
                  }
                },
                {
                  "bool": {
                    "must": [
                      {
                        "match": {
                          "ownerType": "GROUP"
                        }
                      },
                      {
                        "bool": {
                          "should": [
                            {
                              "match": {
                                "owner": 5
                              }
                            },
                            {
                              "match": {
                                "owner": 3
                              }
                            },
                            {
                              "match": {
                                "owner": 1
                              }
                            }
                          ]
                        }
                      }
                    ]
                  }
                }
              ]
            }
          }
        ]
      }
    },
    "aggs" : {
        "distinctIds" : {
            "terms" : {
              "field" : "id.keyword",
              "size": 2147483647
            }
        }
    },
    "size": 0
}

I know it's a little monster ;D But what I want to achieve here is, of course do a kind of "wildcard search" over all fields for the name "Max", and to make sure the entries should at least contain either my user id as an owner or one of my group ids. But I don't want the actual entries, just the distinct ids so I can afterwards join the entries.

The problem here is that I only get 126 or 138 or 141 buckets although I know there must be 10.315 matches. And what would happen if the matches will be > Integer.MAX_VALUE?

Is this even the right way to get what I want?

Thanks in advance =)


#2

Can't really nobody help with this? =(


(Adrien Grand) #3

If the expected number of unique ids is low, then your current approach is fine. But otherwise if it has a high cardinality, I would keep the same query, remove the aggregation, scroll through all the results using the _scroll API and collect matching ids on client side.


#4

OK so you think the problem here is the mass of returns. If I have a "refined" search with much less hits it would be more reliable?

I don't think doing scroll and collect on client side would be nice because the "page"-sizes would vary too much and it could be that the same id pops up in 2 scrolls because there are too many entries with the same id. I hope this explains what I mean:

Guess we have a scroll size of 5 entries.

1 1 2 2 3 | 3 3 4 4 4

But I could return all distinct ids on server side that would be OK if it's possible to exclude all fields except the id field to have less traffic?


(system) #5

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.