Not equal condtions in elasticsearch aggregation

hadii · July 6, 2022, 5:21am

I have a rails app that connects to Elasticsearch and the user could define some conditions like (at least),(at most), and ... to compare and find his specific result. everything is fine but now I have to add not equal to my comparison operator list and even calculate not equal for all of my conditions described before. for example, if the user wants to search, "not equal at least" I want to calculate the result based on the user query and then make them reverse in result I know make query response reverse, it's impossible in Elasticsearch but does it possible elastic calculate not equal by himself, assume user want to know ((the count of the specific event did not happen at most 1 time in last 30 days)) I know elastic support not equal query, but just in bool and query string query, but in my case, I should use aggregation and terms query in it. aggregation like :

body_data =  {
							group_by_profile_id: {
								terms: {
									field: "profile_id.keyword",
									min_doc_count: @count_first,
									size: 10000
								},
								aggs: {
									filter: {
										bucket_selector: {
											buckets_path: {
												docCount: "_count"
											},
											script: "params.docCount #{@operator} #{@counts.last}"
										}
									}
								}
							}
						}

does anyone knows how can i handle not equal query in aggregation and term query.

Tomo_M · July 6, 2022, 7:22am

Have you tried must_not clause of boolean query with some match_all query in filter clause?

hadii · July 6, 2022, 9:06am

Thanks, @Tomo_M for your reply, In fact, yeah I try but in this specific case I have the wrong answer, let's me explain, suppose I user wants to know " who did not log in at most one time in the last month" or " who did not log in exactly zero time in the last month" and " who did not log in at least one time in the last month" for this kind of conditional query I have to use term query for the positive query like who did log in at most one time in the last month" or " who did log in exactly zero time in the last month" and " who did log in at least one time in the last month" as I know, but I don't know how to implement not equal.

Tomo_M · July 6, 2022, 10:31am

Let's organize first. Any "not equal" conditions could be converted to other conditions. Fo example, "not equal 0" for count is the same as "at least 1" or "more than 0". It is not the problem of "not equal".

Those conditions could be filtered by bucket_selector as your query. The problem, however, is bucket_selector only selects from existing buckets. Those who did not log-in in the last month may not possibly be included as a bucket originally. min_doc_count:0 does not work for creating non-existing buckets.

You have to prepare buckets as material first.

GET kibana_sample_data_flights/_search
{
  "size":0,
  "aggs":{
    "t":{
      "terms":{
        "field": "DestAirportID",
        "size": 10000
      },
      "aggs":{
        "feb":{
          "filter": {
            "range":{
              "timestamp":{
                "gte": "now-5M/M",
                "lt": "now-4M/M"
              }
            }
          }
        },
        "last_month":{
          "filter":{
            "range":{
              "timestamp":{
                "gte": "now/M"
              }
            }
          }
        },
        "filter":{
          "bucket_selector":{
            "buckets_path":{
              "last": "last_month._count"
            },
            "script": "params.last == 0"
          }
        }
      }
    }
  }
}

With top-level terms aggregation, you get buckets for every terms existing in your index. You can filter for any conditions by bucket_selector aggregation.

If you want to use filters aggregation, sample query is as follows.

GET kibana_sample_data_flights/_search
{
  "size":0,
  "aggs":{
    "t":{
      "terms":{
        "field": "DestAirportID",
        "size": 10000
      },
      "aggs":{
        "f":{
          "filters": {
            "filters": {
              "all": {
                "match_all":{}
              },
              "Feb":{
                "range":{
                  "timestamp":{
                    "gte": "now-5M/M"
                  }
                }
              }
            }
          },
          "aggs":{
            "m":{
              "max": {
                "field": "timestamp"
              }
            }
          }
        },
        "filter":{
          "bucket_selector":{
            "buckets_path":{
              "all": "f['all']>_count",
              "feb": "f['Feb']>_count"
            },
            "script": "params.all > 1 && params.feb <300"
          }
        }
      }
    }
  }
}

With match_all

Ayush_Mathur · July 7, 2022, 1:55am

@hadii Are the logs generated for all users even if they don't login (exactly 0 or atmost 1 cases) ? If not, you will have sub-data as aggs/dsl won't work unless you increase range when those users are included to get hits on them.
If logs are generated with count as 0(or more) which ofcourse should be taken as param, you can either do what Tommy has mentioned OR

Filter results by range for 1 month keeping from and size 0 to avoid memory consumption.
Terms aggs on user with max bucket size.
Nested aggs for login_count (max) to get count per user as nested value (ctx.payload.aggregations.user.count.value)
If you want to use this same script for all cases, use inline transform script to iterate over users and compare login count value with user input param and give all conditions as required in your script.
If you want separate scripts for each use case, you can use aggs filter: Filter aggregation | Elasticsearch Guide [8.3] | Elastic

hadii · July 9, 2022, 5:21am

Thanks, @Ayush_Mathur for your reply, I think I have to explain my case more, In my system, I have some specific actions like "app_open, sign_up, ..." that when users do logs generated in my system and of course user could "app_open" but do not "sign_up" at the same time, but my problem is when admin need to search on this logs he could search something like this, "user did not at most 0 sign up ", In the interpretation of this sentence, I have to say "did not" means must not in my system but about "at most" I use comparison operators with the query that explain before to find my results, As I know elastic support (not equal) and comparison with, (not conditions) just in the query string and bool query, So If I want to read the last query it's should be this, "all user who signs up at least n + 1 time", As @Tomo_M said I think about date range but I dont know does it rigte to add a date range to this query or not?

system · August 6, 2022, 5:21am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
ElasticSearch Query containing equal and not equal Elasticsearch	1	374	September 6, 2022
Search template with compare equal Elasticsearch	4	85	May 27, 2024
Query assistance for conditional querying Elasticsearch eql-elastic-query-language	2	94	April 24, 2024
How to create a numeric not equals filter/query? Elasticsearch	2	1503	July 6, 2017
Elasticsearch query for `SELECT id FROM foo WHERE id NOT IN (SELECT id FROM foo WHERE ...) Elasticsearch	5	655	March 14, 2023

Not equal condtions in elasticsearch aggregation

Related topics