Slow elasticsearch sql query with aggregation to get Top-N results

Hi,
I am trying to visualize some data and I am using elasticsearch sql to build some visualizations. I am importing data about 346M with 2000000 docs.
image

This is my slow sql:

select  userId, count(*) from tableA where metricTime>0 group by userId order by cnt limit 10

but it cost 1min to get results.

select  userId, count(*) from tableA where metricTime>0 group by userId

this query is fast, only 140ms.

This is there response pic:

The dataset has 597443 distinct userId(actually 600000 because of the HyperLogLog++ algorithm used in elasticsearch ) and 2000000 lines. image

So, my question is why the elasticsearch sorts on aggregate to get Top-N results slow? Or it is naturally? Or how can I improve it ?

@letwefly ES-SQL uses a composite aggregation to run any query that needs an aggregation in it, for the simple reason that it has pagination support. But composite does come with a disadvantage as well: it cannot sort on anything else other than its keys.

In your query case, though, the sorting needs to happen on the bucket size. On the server side (ES) this is not possible, so ES-SQL implemented client-side sorting in this PR. Sorting being performed client-side, I guess this is where the slowdown comes from.

Thanks. And is there anything we can do to speed up the query? Or to by-pass the client-side limit? Anyway what can i do if i want to get the Top-N sorting results in a short time(several seconds not minutes)?

There is no "bypassing" the limit, as this is how sorting by COUNT(*), in this specific case, works.

"Top-N" can mean several things, depending on how you sort the results. As I mentioned, the problem here is sorting by COUNT(*) and aggregating (group by userId) at the same time and there is no way around it, unless you change your query.

I think I was not clear in my previous message - this is actually a limitation in Elasticsearch and not in SQL. Elastisearch doesn't know how to sort the aggregation results by their COUNT. SQL just tries to offer the functionality and the only way it can do this is client-side (in memory).

Thank you very much for your kindness.:grin:

Apologies @letwefly. Re-reading my earlier message, my tone wasn't the kindest one :-).

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.