Now that rivers are being deprecated, I need a more future proof easy way of replicating MongoDB data to our Elasticsearch cluster, what would be the best solution for this?
We don't need to transform the data before storing it in Elasticsearch, for now we just want to index everything. I definitely want to avoid baking stuff into our code and re-inventing wheels.
I think you can do it with logstash.
It has a plugin for mongodb as input, and elasticsearch can be configured as output.
The other way is what you want to avoid I guess - in this case you can create a small application which loads records from mongodb and inserts them into elasticsearch through the bulk API.
Thanks. These do seem to be the 2 best options.
We're not currently using logstash at all, but I have been considering introducing it to our infrastructure for other purposes, this would be "better" solution. However, developing a small application would probably be the quickest and easiest for now.
W.
I've been experimenting with the mongo-connector, seems to work quite well. I have it running and indexing our data in the simplest possible way.
We've built a basic sandbox on top of our elastic search server (because we're not running Kibanna) to validate what's being indexed.
The next step is finalise our API for access controlled searches, and potentially change slightly what we're indexing so that our API implementation is as clean as it can possibly be.
Opted for the mongo-connector in the end because I didn't want to re-invent any wheels. Hoping there will be an official MongoDB input plugin soon.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.