We've been using Elastic in capturing and consolidating IoT data however - my experience with Elastic is however limited to about 6 months.
Consider the scenario:
large number of IoT devices with known serial numbers are sending timestamped data
a separate index contains the geo_point for each device (key being the unique serial number)
data/messages from the devices are enriched with the matching geo_point (again, key being the unique serial number)
This is great for creating maps (especially heatmaps).
However is this the best way to do it? Is enriching each and every message with the geo_point info not wasteful by taking up a lot of space that could be correlated from another index. The expected number of messages from the IoT devices is increasing rapidly (ex: 6 months ago the average was 1M/week, now is at 5M/week).
Any suggestions on how to improve or optimize this would be much appreciated.
As Elasticsearch does not support joins the enrichment approach is the one generally recommended unless you have a custom UI where you can merge data from separate queries (basically perform the join at the application layer).
You could do this in Vega if enrichment is not an option. In Vega you can make 2 separate Elastic queries on 2 separate indexes and then join the data using transformations. Then you can render a map using the points. Vega is a steep learning curve.
But I would recommend you enrich the data one time during ingest and then create a visualization off that one index vs the Vega route which would be much less optimized since it would have to do all that querying and joining for each time the map is rendered.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.