ElasticSearch Analytics Capabilites


(Binil Thomas) #1

ES seems to have ability to run analytic queries. I have read about people
using it as an OLAP solution 1, although I have not yet read anyone
describe their experience. In that respect how does ES analytics
capabilities compare against:

  1. Dremel clones 2 like Impala & Presto (for near real-time, ad hoc
    analytic queries over large datasets)
  2. Lambda Architecture 3 systems (where queries are known up- front, but
    need to run against a large dataset)

Does anyone here have experience running ES in such usecases, beyond the
free text searching one ES is well-known for?

Thanks,
Binil

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5c75a380-3971-45cd-b10d-a91b3b97ecc3%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Adrien Grand) #2

Hi,

I have to admit I don't know much about these other systems you mentionned.
Something I can say is that the way these analytics queries are run is
similar in the sense that they are run over a column-oriented view of the
data (called fielddata in Elasticsearch). You can also read this thread
that highlights similarities between Parquet and Lucene doc values (that
Elasticsearch can use as a fielddata backend) 1.

Something I would expect Elasticsearch to do well compared to these systems
is:

  • working with strings: because storage is segment-based every term can be
    identified by an ordinal, which can be used to make some computations very
    fast (eg. terms aggregations2)
  • slicing and dicing data: thanks to its inverted index, Elasticsearch can
    very quickly filter documents that match specific criteria and only run
    analytics on this subset of the data.

The other tools you mentioned probably have pros as well but I don't know
them enough to be able to tell you what they would bring compared to
Elasticsearch.

1
http://lucene.472066.n3.nabble.com/Parquet-dictionary-encoding-amp-bit-packing-td4090238.html
2
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html

On Thu, Feb 13, 2014 at 8:57 PM, Binil Thomas binil.thomas@gmail.comwrote:

ES seems to have ability to run analytic queries. I have read about people
using it as an OLAP solution 1, although I have not yet read anyone
describe their experience. In that respect how does ES analytics
capabilities compare against:

  1. Dremel clones 2 like Impala & Presto (for near real-time, ad hoc
    analytic queries over large datasets)
  2. Lambda Architecture 3 systems (where queries are known up- front, but
    need to run against a large dataset)

Does anyone here have experience running ES in such usecases, beyond the
free text searching one ES is well-known for?

Thanks,
Binil

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5c75a380-3971-45cd-b10d-a91b3b97ecc3%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j61JVv9p8csvBkD6fGba6u%3D59g%3DhzXBVrf%2BsqeX%3DZ82HA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #3