My terms aggregation (on a high cardinality field) with an inner cardinality aggregation takes about 23 seconds to complete (one node, one shard, 300,000 documents, keyword fields).
The equivalent SQL query using SQL server takes a few seconds at most.
Is it reasonable of me to make that comparison? Maybe SQL Server is just more suited for this kind of queries?
Try reduce the precision_threshold [1] setting to reduce the memory used to calculate these values for each of your many order_id buckets. The default value is 3,000 and I imagine the average number of items per order falls way below this value.
Thanks! Didn't know it had this kind of effect on memory consumption!
This indeed lowers memory consumption for me below the circuit breaker.
Performance wasn't affected that much by this change, but it does help.
Side note - my comparison with SQL Server was faulted because of caching.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.