I'm curious about reaching deeper into the lucene internals with es-hadoop,
in a similar way that the aggregations module works. While aggregations
are amazing, there are cases where they aren't an ideal solution, mainly
due to the inability to shuffle/repartition the data as it moves through an
analytic. I realize the current implementation can pull single fields by
using an include/exclude on the query, but since this has to go to the
source it does not strike me as a performant solution. With an es-spark
interface that could pull doc values/doc ids in a similar way that
aggregations do, it would be possible to create arbitrary analytics on any
query context. Has any thought been given to this?
So I guess I missed that fielddata fields could be specified in the search
request body. That's pretty cool!
On Thursday, January 29, 2015 at 1:52:10 PM UTC-5, Elliott Bradshaw wrote:
I'm curious about reaching deeper into the lucene internals with
es-hadoop, in a similar way that the aggregations module works. While
aggregations are amazing, there are cases where they aren't an ideal
solution, mainly due to the inability to shuffle/repartition the data as it
moves through an analytic. I realize the current implementation can pull
single fields by using an include/exclude on the query, but since this has
to go to the source it does not strike me as a performant solution. With
an es-spark interface that could pull doc values/doc ids in a similar way
that aggregations do, it would be possible to create arbitrary analytics on
any query context. Has any thought been given to this?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.