We started to use Elastic stats for metrics purpose and now we plan to do some analytics on our data.
I plan to index our "company" table to ES and denormalize some data with it. One a this new field is going to be updated sometime. I have to know if the company is active or not (meaning it have one sell in a range of a year).
And I'm guessing what is the best approach to update this field. At first I will index all the data with logstash but then ? I think of a CRON job every X days but is there another way to do it ?
Basically, I'd recommend modifying the application layer if possible and send data to elasticsearch in the same "transaction" as you are sending your data to the database.
Merci David !
That's eactly what I have plan to do, but what about change ? A previous inactive company that start using our services (and with other data knowing that is thanks to our email campaign), how to refresh the data. The real question is how to keep it sync ! I have several idea but I would like to have a feed back from more experimented dev.
Reading your article I understand that the best approach is to add the modification from the CRUD on the DB application to keep it sync.
In my previous job, I did that.
I also created a "reinit job", which was basically reindexing my existing database by reading entities and sending them as json documents to elasticsearch.
Not something running everyday but sometimes when we wanted to change the schema or wanted to fix some differences between both systems.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.