I am running the following ES|QL query against 192M documents
from filebeat-audit-*
| where user.name is not null AND source.ip is not null and host.name is not null
| where event.outcome == "success" and cidr_match(source.ip,"2.56.0.0/16")
| stats cnt = count_distinct(source.ip) by user.name
| keep cnt, user.name
| where cnt > 1
| limit 10000
and it returns error: [esql] > Unexpected error from Elasticsearch: circuit_breaking_exception - [request] Data too large, data for [<reused_arrays>] would be [21654658976/20.1gb], which is larger than the limit of [18038862643/16.7gb]
but if I remove stats - it returns only 305 documents (source.ip matches 2.56.0.0/16)... why Elastic cannot run stats on it?
from filebeat-audit-*
| where user.name is not null AND source.ip is not null and host.name is not null
| where event.outcome == "success" and cidr_match(source.ip,"2.56.0.0/16")
| limit 10000
from filebeat-audit-*
| where user.name is not null and source.ip is not null and host.name is not null and event.outcome=="success" and cidr_match(source.ip,"2.56.0.0/16")
| STATS SUM(1) BY user.name,source.ip
| STATS cnt = count_distinct (source.ip) by user.name
| keep cnt, user.name
| WHERE cnt > 1
| limit 10000
It's good that you found a workaround, but still the original q is interesting.
Would you/others consider it a bug? In your case the original ES|QL query failed, but had you had maybe less than 192M documents it might have "worked", but still consumed a lot of resources temporarily (and arguably un-necessarily).
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.