ES as a timeseries metrics store?

In the most recent announcement
(http://elasticsearch.com/blog/welcome-jordan-logstash/)
the following caught my eye:

"Another example is a company which uses Logstash and Elasticsearch not
only for all their application logs, but also for all of their application
metrics. The ability to tie metrics indicating high CPU usage to a log
message of “mmm, we shouldn’t really get here” has proven to be invaluable
more than once."

I've been looking a lot at dedicated multi-server timeseries metrics stores
(opentsdb, kairosdb, blueflood etc) and I wonder how ES compares.
Specifically, I'm looking for a system that does:

  • load distribution across servers (reads and writes),
  • HA (data in+out should be 100% available in case of node failures) and
    self healing (replication) in case nodes go down.
  • the ability to aggregate data in the storage system itself (i.e. given
    minutely datapoints, give me the per-hour max, or min, or avg, or mean. or
    histograms across a certain timerange)
  • easy or trivial to deploy (I know ES kicks ass here)

nice to have:

  • balance data to prevent nodes with unequal disk capacities running full.

I'm not sure how appropriate ES is because:

  • there's no need for flexible schema. datapoints are basically (timestamp,
    value) pairs. not "documents". no need for unique "id's". I'm concerned
    about storage space and performance here. (value could be float or int)
  • fixed intervals means it should be easy to seek to the location of the
    data, no real indices needed, and in fact this means timestamps don't need
    to be stored, they are implicit. does this work with ES?

I noticed a lot of the information about ES is "top down" (what the
features are and how to use them). is there any "bottom-up" material that
conveys the internal design of ES, and what makes it applicable (or not)
for certain use cases?

  • If I wanted to build a metrics system with ES, how would I go about it?
    any tutorials?

thanks,
Dieter

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

If you can assign your metrics to a millisecond value and not nanoseconds,
you're fine with ES, because milliseconds fit into long data type.

Consider your value types you want to store. In most cases you have many
types of values in an event, and let it just be the source of the event.
Then just create documents per event, with many fields to represent an
event.

You can tweak a lot about storage space. There is a tradeoff, but by
default, the index is compressed.

You can create an index of fixed intervals, and store whatever you like in
there, a document per fixed interval.

Note, ES is weak when it comes to computational aspects of your stored
values (for example, performing statistical or sorting procedures on huge
amount of values while indexing). Also, there is no stream processing API.
Currently, this can be done by query result post processors outside of ES
with a pulling client. In the future of ES 1.0, there will be an
aggregation framework with many exciting features.

I do not fully understand what you mean by "bottom up" material. Do you
want to read about Lucene internals or Java implementations and other
boring details? Or how an event pump can be built to push data into ES?

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.