How to filter the buckets that have more than N documents using ElasticSearch DSL in python?

I have an index in Elasticsearch that contains information of a user in each document, along with the facebook posts they have made (in a denormalized manner).

Each document contains: User_ID | User_Name | Post_Text | Post_Emojis

I want to retrieve the IDs of the users who have more than N posts.

I am new to using Elasticsearch, especially to Search DSL using python (Search DSL — Elasticsearch DSL 7.2.0 documentation)

I am creating buckets using the terms aggregation on the User_ID field, and want to filter the buckets based on the number of documents that fall inside each bucket.

This is the function I managed to create, however, as I'm unaware of the proper syntax, and am still confused with the documentation, I can't manage to execute it and attain the correct response.

def users_more_posts_than_query(search_object: Search, num_posts: int):
    search_object = search_object.aggs.bucket('posts_count', 'terms', field='user_id')\
        .pipeline("having_posts", "bucket_selector", buckets_path={"postsCount": "_count"}, script=f"params.postsCount > {num_posts}")

    response = search_object.execute()

    for hit in response.hits:
            hit.user_id

Please point out what I am doing wrong here, and how I can achieve my desired goal.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.