ES Hadoop push down aggregations or not?

I am on ELK 7.13.0 and I have a super very slow query:

events_df = es_reader.load("priam_unified_host-{0}/unified-host".format(date))

counts_df = events_df.where('EventID == 4688').groupBy(col("LogHost"),col("ProcessName")).count()
counts_df.explain(extended=True)

This is the execution plan:


== Physical Plan ==
*(2) HashAggregate(keys=[LogHost#1347, ProcessName#1354], functions=[count(1)], output=[LogHost#1347, ProcessName#1354, count#1408L])
+- Exchange hashpartitioning(LogHost#1347, ProcessName#1354, 200), ENSURE_REQUIREMENTS, [id=#258]
   +- *(1) HashAggregate(keys=[LogHost#1347, ProcessName#1354], functions=[partial_count(1)], output=[LogHost#1347, ProcessName#1354, count#1413L])
      +- *(1) Project [LogHost#1347, ProcessName#1354]
         +- *(1) Filter (isnotnull(EventID#1345L) AND (EventID#1345L = 4688))
            +- *(1) Scan ElasticsearchRelation(Map(es.inferschema -> false, es.net.http.auth.user -> elastic, es.net.ssl.cert.allow.self.signed -> true, es.net.http.auth.pass -> 123456, es.read.field.as.array.include -> tags, es.resource -> priam_unified_host-2021-05-03/unified-host, es.nodes -> elasticsearch:9200, es.net.ssl -> true),org.apache.spark.sql.SQLContext@515d109c,None) [LogHost#1347,ProcessName#1354,EventID#1345L] PushedFilters: [IsNotNull(EventID), EqualTo(EventID,4688)], ReadSchema: struct<LogHost:string,ProcessName:string,EventID:bigint>

Does it mean that it is effectively scanning the index (via scrolling???) and not leveraging the Aggregates constructs in ES?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.