What's the best way to improve performance if thousands of filters?

Morriaty · October 18, 2019, 3:52pm

The situation is about document level security. A sample document would be like the following

{
          "create_time": 1500000000,
          "title": "xxxxxxxxxxxxxxxxxxx",
          "access_group": ["g1", "g2716", "g3018"]
}

Say we got ten thousands groups.

It's easy for a super-admin (no group filter) or normal user (several group filters) searching documents, but when it comes to some special users with access to thousands of groups, the search performance declined significantly.

Is there any suggestions to improve performance of this situation? Thank you for help!

Morriaty · October 21, 2019, 1:44am

To be clear, a sample query would be like:

{
  "query": {
    "bool": {
      "filter": {
        "terms": {
          "access_group": [
            "g1",
            ....
            "g10000"
          ]
        }
      },
      "should": [
        ...
      ]
    }
  }
}

gabriel_tessier · October 21, 2019, 4:24am

Hi @Morriaty

One suggestion can be to have a second index if you expect to have thousands of records and often changes.

You keep your user index as it with create_time and title field but you remove access_group

{
          "id": 123456
          "create_time": 1500000000,
          "title": "xxxxxxxxxxxxxxxxxxx",
}

You will save your group and user relation in a different index.

Something like database but without relation constraint.

access_group_index/doc/1
{
       "user_id": 123456,
       "group": "g1"
}

access_group_index/doc/2
{
       "user_id": 123456,
       "group": "g2716"
}
etc...

Merit: you can list all the group and paginate, you can search and more easily and it will be faster (depend on your request).
Demerit: you may need to make 2 requests, one to check the group and one to get the detail of the user, depends on the context.

You can also duplicate your data and keep the list in the access_group field same as you have now and have the list in parallel for the other search, but you need to be careful and maintain 2 index. It can work depend on your constraint and your code.

I use this way to manage tags in blogs and so far I didn't have problem.

Christian_Dahlqvist · October 21, 2019, 5:07am

What is the size of your data set? How many users do you have? How many distinct groups are there? How frequentt try ly do you update or change group membership? Which version are you on?

Morriaty · October 22, 2019, 1:54am

sorry, don't understand how can two indices help. Doesn't it have to perform thousands of group filters in access_group_index？

Morriaty · October 22, 2019, 2:42am

Hi, here is details

document size: 1 billion
users: 200, 000 groups: 1,500,000
update frequency: Group membership changes are not frequent. No exact statistics, but could say it no more than 10 tps.
ES version: 5.3.0
hardware: Three master nodes with 8 cores and 16GB memory, assigned 8GB to jvm. Nine data nodes with 16 cores and 64GB memory, assigned 32GB to jvm. No SSD.

Christian_Dahlqvist · October 22, 2019, 5:42am

I was thinking about an alternate way to implement the logic by moving a lot of the work to indexing time rather than search time, but do not think it will work at that scale. I am also not aware of any way to improve the performance of terms queries with a large number of terms so will need to leave this for someone else.

Maybe there is something that can be done by reorganizing how your data is indexed though? How many indices and shards do you have in the data spread across? How many queries are you serving per second? Do all queries always address all indices?

system · November 19, 2019, 5:42am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Improving query performance with many filtered aliases Elasticsearch	5	1938	March 1, 2017
Best design to index and search temporary documents Elasticsearch	1	488	April 9, 2018
Filter with millions of record Elasticsearch	44	5826	August 2, 2018
Filter on thousands of IDS - How is it efficient? Elasticsearch	2	620	July 6, 2017
Speed of query with many filters Elasticsearch	6	371	July 6, 2017

What's the best way to improve performance if thousands of filters?

Related topics