Hi, our project uses MongoDB as a persistent data store (with rapidly
changing fields in documents), and elasticsearch as an index for the
invariant/slowly-changing fields in front.
We have about 100M documents (10M large, 100M small; average 100
fields/large-doc, 10 fields/small-doc) in a 6-node pseudo-operational
cluster (3 elasticsearch and 3 MongoDB). We are expecting our
operational deployments to be in the 10-20 node range.
We moved to elasticsearch from SOLR. elasticsearch was far easier to
integrate because both MongoDB and elasticsearch "naturally" use JSON.
(Actually our initial reason for migrating was geo-spatial
functionality, the fact we could retire lots of code and have fewer
problems when schemas changed was a nice secondary benefit!)
A few aspects of our integration:
-
We don't run MongoDB and elasticsearch on the same "physical" (/
logical) nodes because they are both pretty memory and disk-bandwidth
hungry (MongoDB more so than elasticsearch, we hardly run anything
else on our MongoDB node). The two instances should be connected by a
fast LAN though.
-
We don't use a river to synchronize them - we control all
insertions/deletions/modifications into the data store, therefore we
can "mirror" the objects at that point (this is more efficient and
also allows us to transform the objects to take advantage of
elasticsearch-specific features in eg geo).
(That said, we had to write a custom ORM to support this maintainably,
so that was a downside - for smaller prototype projects this shouldn't
be necessary however: just convert the object to both JSON and BSON -
eg using "gson", insert one into elasticsearch and one into mongodb)
- We retrieve the documents from MongoDB based on the (common) "_ids"
returned from elasticsearch. This part gets a "C+" at best -
elasticsearch is really fast, the MongoDB "$in" query is really fast,
but returning all the (large) documents from MongoDB is a bit slow
(1000 can take about 1.5s, dominated by network IO). I think MongoDB
have scope to speed up their network IO but it's acceptable for the
moment.
(If you need to perform analytics on very large numbers of documents
defined by a search, rather than "just" return the results of
searches, this method of integration may not be suitable. We're
investigating tighter coupling between the 2 platforms for this
purpose at the moment. FWIW I asked one of the 10gen lead engineers if
they had any tricks up their sleeves and he couldn't think of anything
on the spot.)
-
We have a separate process for monitoring the synchronization
between the 2, somewhat similar to the scrutineer someone just posted
(I had a quick look at the code, and it looked like it would be very
easy to write a MongoDB driver to go along with the existing JDBC
ones).
-
Not really, an integration issue, but elasticsearch is far easier
to distribute across multiple nodes than MongoDB!
I can't think of anything else off the top of my head.
So in summary, elasticsearch integrates with MongoDB much better than
SOLR (as well as being better for our application in many other ways).
It's easy to get up-and-running, though there's a few issues for
bigger/more complex code.
On Nov 15, 9:46 am, Shreyas Desai shre...@bhagda.com wrote:
Guys,
People might have already asked question about ES + MongoDB
integration.
But google didn't return any result for "mongodb integration"
So, has anyone successfully integrated mongodb with ES?
Is SOLR better to integrate with MongoDB? Any recommendations?
Thanks,
Shreyas