Slow elasticsearch sql query with aggregation to get Top-N results

letwefly · October 24, 2019, 2:01am

Hi,
I am trying to visualize some data and I am using elasticsearch sql to build some visualizations. I am importing data about 346M with 2000000 docs.

This is my slow sql:

select  userId, count(*) from tableA where metricTime>0 group by userId order by cnt limit 10

but it cost 1min to get results.

select  userId, count(*) from tableA where metricTime>0 group by userId

this query is fast, only 140ms.

This is there response pic:

The dataset has 597443 distinct userId(actually 600000 because of the HyperLogLog++ algorithm used in elasticsearch ) and 2000000 lines.

So, my question is why the elasticsearch sorts on aggregate to get Top-N results slow? Or it is naturally? Or how can I improve it ?

Andrei_Stefan · October 25, 2019, 7:27am

@letwefly ES-SQL uses a composite aggregation to run any query that needs an aggregation in it, for the simple reason that it has pagination support. But composite does come with a disadvantage as well: it cannot sort on anything else other than its keys.

In your query case, though, the sorting needs to happen on the bucket size. On the server side (ES) this is not possible, so ES-SQL implemented client-side sorting in this PR. Sorting being performed client-side, I guess this is where the slowdown comes from.

letwefly · October 28, 2019, 1:04am

Thanks. And is there anything we can do to speed up the query? Or to by-pass the client-side limit? Anyway what can i do if i want to get the Top-N sorting results in a short time(several seconds not minutes)?

Andrei_Stefan · October 29, 2019, 12:32pm

There is no "bypassing" the limit, as this is how sorting by COUNT(*), in this specific case, works.

"Top-N" can mean several things, depending on how you sort the results. As I mentioned, the problem here is sorting by COUNT(*) and aggregating (group by userId) at the same time and there is no way around it, unless you change your query.

I think I was not clear in my previous message - this is actually a limitation in Elasticsearch and not in SQL. Elastisearch doesn't know how to sort the aggregation results by their COUNT. SQL just tries to offer the functionality and the only way it can do this is client-side (in memory).

letwefly · November 1, 2019, 12:54am

Thank you very much for your kindness.

Andrei_Stefan · November 2, 2019, 12:09pm

Apologies @letwefly. Re-reading my earlier message, my tone wasn't the kindest one :-).

system · November 30, 2019, 12:10pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Problems with "Top N" SQL query Elasticsearch elastic-stack-sql	2	905	October 10, 2019
Slow aggregation queries, only after data change (ES 2.3) Elasticsearch	9	1321	December 26, 2016
Elastic Search Aggregations Slow Elasticsearch	21	2623	November 26, 2021
Very slow aggregate query Elasticsearch	16	1643	November 12, 2021
Aggregation running very slow Elasticsearch es-hadoop	8	2998	July 6, 2017

Slow elasticsearch sql query with aggregation to get Top-N results

Related topics