Hi, we're a Mexican NGO working all across latin-america to stop corporate abuse, you can check out our bilingual website here: https://poderlatam.org
Our most intensive data project is about public contracts, it has even won a Sigma Awards Data Journalism price (which we couldn't actually collect because of the pandemic, but still). We already have 5 million documents and counting. Our data ingestion engine is based in Apache NiFi and it's working quite fine, that is, until we have to move the data from the processing cluster to the production cluster. In our production cluster we have a custom API that exposes the data with an open license for reuse and also for our own apps.
We need a Secure Transactional Inter-cluster Replication (STIR) setup using Elastic and we're only two developers working on this project (and many others!).
There are several challenges with this project right now, the main one being moving the data securely, it's about 20gb per week. We would like to set-up authentication in our cluster -for added Security- and also to allow for Inter-cluster replication. It's important to note that we don't want to automatically replicate our data from the processing to the production cluster, because we need to check for errors before, so this is what we mean by Transactional.
We tried and failed to set-up security in our kubernetes-based cluster following the official docs. And since we don't want to expose the unsecured Elastics to the internet, our current process is just a manual dump and import of data. To avoid downtime, we're thinking of having double indexes in production, and then use index aliases to switch from one to the other after the import is done.
Any help in improving this setup would be greatly appreciated.
Please note that our 2020 budget is long gone (otherwise we would hire the same consultants that help us create the cluster in the first place), but maybe next year, if continue to work together, we can talk business.
Thanks in advance, Martín and Fernando from PODER.