Percentile_ranks for values in database


#1

Hi guys!

I have data that looks like this:
{
"user" : {
"id: 1,
"money": 100,
"points": 50
},
.....................
}

I'm trying to build a query with ranks for each of these columns (money and points). So I would love to have results like this:

UserId | percentile_rank_money | percentile_rank_points
or at least
percentile_rank_money | percentile_rank_points

In SQL world I can archive this using: " PERCENT_RANK() over (Partition By "
But for elastic I have to specify list of values for 'percentile_ranks' and because I have more that 2 columns to calculate rank sending request for every single one is not a good idea.

Is it possible to archive such result?
Thanks is advance


(Zachary Tong) #2

I'm not sure I understand the issue. You can include both percentile ranks in the same query:

{
  "aggs": {
    "users": {
      "field": "id"
    },
    "aggs": {
      "ranks_points": {
        "percentile_ranks": {
          "field": "points"
        }
      },
      "ranks_money": {
        "percentile_ranks": {
          "field": "money"
        }
      }
    }
  }
}

#3

Hi Zachary,
Thanks for your reply.
The problem is, that if I use percentile_ranks I get an error: Required [values], so AFAIK I have to specify list of values, that I don't know without reading all of them from elastic at first


(Zachary Tong) #4

Sorry, I'm still a bit confused.

You have to specify values because that's what the percentile_ranks aggregation does: it tells you the rank of specific values that you care about.

If you just want to know the 90th percentile and find out what the value is at that point, you should use the percentile aggregation instead.

Basically, percentile_ranks is for asking what the rank of specific, known values is. percentiles is for asking what value is greater than n percent of the data.


#5

What can I do if I don't know the value?
I'm trying to find something similar to SQL query:
SELECT UserId, (PERCENT_RANK() OVER (Partition By UserTypeId ORDER BY money desc) AS MoneyRank
FROM blablabla
WHERE blablabla

So, I'm interested in percentile_ranks for all users that satisfies some filter expression and I have no idea about their values.
For now I found only 1 workaround for Elastic is to read all users that satisfies filter expression, then grab all their Money field values and use them to get percentile_ranks and then match these ranks with what I got previously. It' a bit complicated on my point of view in comparison to 1 SQL query, but for now I don't have better solution


(system) #6

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.