but in docs ,I can't find some way to solve this problem , if we cache the data use rdd.count or dataframe.count ,too slow and if our data size is larger ,no result about the count,how can we make it run quickly
in elastic4s ,we can use search in index/type query xxx size 0 and get hit to solve this problem
Doing the query manually through elastic4s might remain the only option for the time being. Count was implemented some time ago to do just that however it changed semantics since in Spark, count actually instantiates all entries.
Going forward we might just add an esCount method however that implies the RDD in question is an ES one. Or potentially bind it to SparkContext/SQLContex.t
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.