I'm setting up a Elasticsearch & Kibana instance aimed for analysing trends over time. Every day we collect from a few new sources, then ingest updated data for the established sources. The issue is that, at the moment, the daily ingest is overwriting the original data - when instead we would like the original data to remain so that we can analyse trends e.g. number of hits per day, changes in metadata, etc.
I feel like I am missing a basic technique in using Elasticsearch - would it be possible to index data with a shared identifier between ingests without deleting previous data?
This will allow you to set up an alias for reading from, and an alias for writing to, and then periodically hit the API so that a new index will be created (so you don't overwrite yesterday's data with today's, for instance)
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.