Hello everybody.
I didn't see any links on these benchmarking tests in this topic, so I'll post it here. It is made by InfluxData staff, but it is something we should take into account. The test methodology is open and can be reproducible by anyone.
- May 12, 2016 InfluxDB v0.13.0 vs Elasticsearch v5.0.0-alpha1
https://www.influxdata.com/influxdb-markedly-elasticsearch-in-time-series-data-metrics-benchmark/
Conclusion: InfluxDB outperformed Elasticsearch in all three tests with 8x greater write throughput, while using 4x less disk space when compared against Elastic’s time series optimized configuration, and delivering 3.5x to 7.5x faster response times for tested queries.
- September, 2016 InfluxDB v1.0.0 vs Elasticsearch v5.0.0-alpha5 https://www.influxdata.com/resources/benchmarking-influxdb-vs-elasticsearch-for-time-series/
It's a similar test and the similar result:
InfluxDB outperformed Elasticsearch by 8x when it came to data ingestion
InfluxDB outperformed Elasticsearch by delivering 4x and 16x better compression
InfluxDB outperformed Elasticsearch by 4x to 10x when measuring query performance
It's really impressive with such InfluxDB superiority.
The big difference between ElasticSearch and InfuxDB is HA solution they have provided and for what functionality we should pay.
You can create a full ES cluster without any functional restriction for free. InfluxData force you to buy a subscription if you want to use InfluxEnterprise (shard clustering). Otherwise without paying only one InfluxDB instance can work with the same metadata (For duplicating data on the other server we're using free Influx Relay).
But with 8x more data ingestion rate do we really need InfluxDB clustering? It seems that if you exceed the single InfluxDB instance capacity you can create another one somewhere else and redirect part of the payload to the new single instance. Single instance can manage < 250 thousand writes per second, it's a number! https://docs.influxdata.com/influxdb/v1.2/guides/hardware_sizing/
Two cents about clustering:
ES is pretty mature, all tasks are doing automatically, you don't need to do anything if you increase replica count or after one of the node downtime. With free Monitoring feature we can always know what's going on in ES cluster.
InfluxEnterprise looks like a junior child, after increasing replica count and adding new node we manually should do Cluster Rebalance: make copy of old shards to the new node using command line. And it is so inconvenient if we have thousands of shards and leading to human error. So I can't recommend to use it today, it definitely needs polishing up. And cool dashboard with metrics about cluster also for money, in free version we have _internal database but there's no description about this.
My conclusion: pure time-series data should be in InfluxDB (or other TSDB, such as ATSD) and searchable text data in ElasticSearch in general. God bless, Grafana, we can draw graphs from InfluxDB / ES / etc. on the same dashboard.
But it depends on your environment. You can use Elastic as TSDB, but it's a matter of time you bump into ES performance. To each his own. Using full-text search engine just for time-series may be overkill.
Metrics begin to create a wide market and there're a lot of players on it. ES was used to full-text-search, but trying to compete with others. See all of Beats utility.