ES seems to have ability to run analytic queries. I have read about people
using it as an OLAP solution 1, although I have not yet read anyone
describe their experience. In that respect how does ES analytics
capabilities compare against:
Dremel clones 2 like Impala & Presto (for near real-time, ad hoc
analytic queries over large datasets)
Lambda Architecture 3 systems (where queries are known up- front, but
need to run against a large dataset)
Does anyone here have experience running ES in such usecases, beyond the
free text searching one ES is well-known for?
I have to admit I don't know much about these other systems you mentionned.
Something I can say is that the way these analytics queries are run is
similar in the sense that they are run over a column-oriented view of the
data (called fielddata in Elasticsearch). You can also read this thread
that highlights similarities between Parquet and Lucene doc values (that
Elasticsearch can use as a fielddata backend) 1.
Something I would expect Elasticsearch to do well compared to these systems
is:
working with strings: because storage is segment-based every term can be
identified by an ordinal, which can be used to make some computations very
fast (eg. terms aggregations2)
slicing and dicing data: thanks to its inverted index, Elasticsearch can
very quickly filter documents that match specific criteria and only run
analytics on this subset of the data.
The other tools you mentioned probably have pros as well but I don't know
them enough to be able to tell you what they would bring compared to
Elasticsearch.
ES seems to have ability to run analytic queries. I have read about people
using it as an OLAP solution 1, although I have not yet read anyone
describe their experience. In that respect how does ES analytics
capabilities compare against:
Dremel clones 2 like Impala & Presto (for near real-time, ad hoc
analytic queries over large datasets)
Lambda Architecture 3 systems (where queries are known up- front, but
need to run against a large dataset)
Does anyone here have experience running ES in such usecases, beyond the
free text searching one ES is well-known for?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.