MongoDB integration – current status?


(hbf-2) #1

Dear everybody,

I have seen several messages on this forum [1,2] that ask for or
describe approaches to integrate Elastic Search into MongoDB. Has
there been any recent development on this? How hard is it to get it
done?

I am looking for a document store that can save attachments in form of
Office-, PDF-, etc. documents and provides full-text search.

Many thanks for any updates or pointers,
Kaspar

[1] http://elasticsearch-users.115913.n3.nabble.com/ES-with-Mongodb-tp1980927p1980927.html
[2] http://elasticsearch-users.115913.n3.nabble.com/ES-and-MongoDB-integration-td1209794.html


(Chris Berkhout) #2

Hi Kaspar,

As far as formal integration goes, I was doing a search of the MongoDB
docs and discussion a few weeks ago for what to do about full-text
search, and ES was discussed a candidate, but nothing is officially
recommended or integrated yet.

I've just started building a Rails app that uses MongoDB via Mongoid
and ES via tire. It's working well so far. Updating Mongoid models
will automatically send updates to ES.

I haven't gotten to attachments yet, but I believe you encode the data
in Base64 and put in a JSON request to ES for indexing. I plan to
start with that over the coming weeks.

I guess you're looking for more direct integration, but what
specifically? Something like the CouchDB river might indeed be good.

There are no triggers in Mongo (here is the discussion of including
them in the future: https://jira.mongodb.org/browse/SERVER-124).
However, there is the "oplog", which is a fixed size, rotating log of
operations, and which is used for replication. You can use a tailable
cursor to follow all changes and respond to them, but you should
probably be careful to deal gracefully with the case of your cursor
dying if entries are expired from the log before you've dealt with
them (http://www.mongodb.org/display/DOCS/Tailable+Cursors).

Some people are doing DIY triggers by tailing the oplog:
http://www.snailinaturtleneck.com/blog/2010/10/27/bending-the-oplog-to-your-will/
http://groups.google.com/group/mongodb-user/browse_thread/thread/5c18b630399d996e

For now I'm happy to handle it at the app framework/app level.

Hope that helps!
Chris

On Sun, Jun 26, 2011 at 4:56 PM, hbf kf@iaeth.ch wrote:

Dear everybody,

I have seen several messages on this forum [1,2] that ask for or
describe approaches to integrate Elastic Search into MongoDB. Has
there been any recent development on this? How hard is it to get it
done?

I am looking for a document store that can save attachments in form of
Office-, PDF-, etc. documents and provides full-text search.

Many thanks for any updates or pointers,
Kaspar

[1] http://elasticsearch-users.115913.n3.nabble.com/ES-with-Mongodb-tp1980927p1980927.html
[2] http://elasticsearch-users.115913.n3.nabble.com/ES-and-MongoDB-integration-td1209794.html


(Shay Banon) #3

Heya,

Yea, I had a chat with a mongo fellow in bbuzz, and it seems like tailing the oplog might work. Its simple a lot/delicate work if it needs to be done right, especially when it comes to supporting replica sets and sharding (and resharding...).

On Sunday, June 26, 2011 at 12:30 PM, Chris Berkhout wrote:

Hi Kaspar,

As far as formal integration goes, I was doing a search of the MongoDB
docs and discussion a few weeks ago for what to do about full-text
search, and ES was discussed a candidate, but nothing is officially
recommended or integrated yet.

I've just started building a Rails app that uses MongoDB via Mongoid
and ES via tire. It's working well so far. Updating Mongoid models
will automatically send updates to ES.

I haven't gotten to attachments yet, but I believe you encode the data
in Base64 and put in a JSON request to ES for indexing. I plan to
start with that over the coming weeks.

I guess you're looking for more direct integration, but what
specifically? Something like the CouchDB river might indeed be good.

There are no triggers in Mongo (here is the discussion of including
them in the future: https://jira.mongodb.org/browse/SERVER-124).
However, there is the "oplog", which is a fixed size, rotating log of
operations, and which is used for replication. You can use a tailable
cursor to follow all changes and respond to them, but you should
probably be careful to deal gracefully with the case of your cursor
dying if entries are expired from the log before you've dealt with
them (http://www.mongodb.org/display/DOCS/Tailable+Cursors).

Some people are doing DIY triggers by tailing the oplog:
http://www.snailinaturtleneck.com/blog/2010/10/27/bending-the-oplog-to-your-will/
http://groups.google.com/group/mongodb-user/browse_thread/thread/5c18b630399d996e

For now I'm happy to handle it at the app framework/app level.

Hope that helps!
Chris

On Sun, Jun 26, 2011 at 4:56 PM, hbf <kf@iaeth.ch (mailto:kf@iaeth.ch)> wrote:

Dear everybody,

I have seen several messages on this forum [1,2] that ask for or
describe approaches to integrate Elastic Search into MongoDB. Has
there been any recent development on this? How hard is it to get it
done?

I am looking for a document store that can save attachments in form of
Office-, PDF-, etc. documents and provides full-text search.

Many thanks for any updates or pointers,
Kaspar

[1] http://elasticsearch-users.115913.n3.nabble.com/ES-with-Mongodb-tp1980927p1980927.html
[2] http://elasticsearch-users.115913.n3.nabble.com/ES-and-MongoDB-integration-td1209794.html


(hbf-2) #4

Hi Chris, hi Shay,

Thanks for sharing your experience with integrating MongoDB and ES.

I am using MongoDB to store documents with attachments (Office
documents and PDFs) and as MongoDB itself does not come with any full-
text search, I was looking into Elastic Search integration. The kind
of integration I was hoping for is - just as you say - one at a lower
level: When I save/update/delete something in MongoDB, the respective
changes would be propagated to ES automatically.

However, as you say also, there does not seem to be any working
integration at this level. I will therefore look into an application
level integration.

I would like to contribute to a MongoDB-ES integration but I do not
know enough about the MongoDB internals to do this. I was hoping that
a nice trigger API might come out at some point that would give me
nice hooks to call ES.

Shay, do you see any way of starting something that we could build on?

Best,
Kaspar

On Jun 27, 11:36 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Heya,

Yea, I had a chat with a mongo fellow in bbuzz, and it seems like tailing the oplog might work. Its simple a lot/delicate work if it needs to be done right, especially when it comes to supporting replica sets and sharding (and resharding...).

On Sunday, June 26, 2011 at 12:30 PM, Chris Berkhout wrote:

Hi Kaspar,

As far as formal integration goes, I was doing a search of the MongoDB
docs and discussion a few weeks ago for what to do about full-text
search, and ES was discussed a candidate, but nothing is officially
recommended or integrated yet.

I've just started building a Rails app that uses MongoDB via Mongoid
and ES via tire. It's working well so far. Updating Mongoid models
will automatically send updates to ES.

I haven't gotten to attachments yet, but I believe you encode the data
in Base64 and put in a JSON request to ES for indexing. I plan to
start with that over the coming weeks.

I guess you're looking for more direct integration, but what
specifically? Something like the CouchDB river might indeed be good.

There are no triggers in Mongo (here is the discussion of including
them in the future:https://jira.mongodb.org/browse/SERVER-124).
However, there is the "oplog", which is a fixed size, rotating log of
operations, and which is used for replication. You can use a tailable
cursor to follow all changes and respond to them, but you should
probably be careful to deal gracefully with the case of your cursor
dying if entries are expired from the log before you've dealt with
them (http://www.mongodb.org/display/DOCS/Tailable+Cursors).

Some people are doing DIY triggers by tailing the oplog:
http://www.snailinaturtleneck.com/blog/2010/10/27/bending-the-oplog-t...
http://groups.google.com/group/mongodb-user/browse_thread/thread/5c18...

For now I'm happy to handle it at the app framework/app level.

Hope that helps!
Chris

On Sun, Jun 26, 2011 at 4:56 PM, hbf <k...@iaeth.ch (mailto:k...@iaeth.ch)> wrote:

Dear everybody,

I have seen several messages on this forum [1,2] that ask for or
describe approaches to integrate Elastic Search into MongoDB. Has
there been any recent development on this? How hard is it to get it
done?

I am looking for a document store that can save attachments in form of
Office-, PDF-, etc. documents and provides full-text search.

Many thanks for any updates or pointers,
Kaspar

[1]http://elasticsearch-users.115913.n3.nabble.com/ES-with-Mongodb-tp198...
[2]http://elasticsearch-users.115913.n3.nabble.com/ES-and-MongoDB-integr...


(Shay Banon) #5

Nothing much other than start to learn mongodb oplog API, how it works in a distributed setup (replica sets and the like), and then we can move forward (I don't know enough about it).

On Friday, July 8, 2011 at 12:49 AM, hbf wrote:

Hi Chris, hi Shay,

Thanks for sharing your experience with integrating MongoDB and ES.

I am using MongoDB to store documents with attachments (Office
documents and PDFs) and as MongoDB itself does not come with any full-
text search, I was looking into Elastic Search integration. The kind
of integration I was hoping for is - just as you say - one at a lower
level: When I save/update/delete something in MongoDB, the respective
changes would be propagated to ES automatically.

However, as you say also, there does not seem to be any working
integration at this level. I will therefore look into an application
level integration.

I would like to contribute to a MongoDB-ES integration but I do not
know enough about the MongoDB internals to do this. I was hoping that
a nice trigger API might come out at some point that would give me
nice hooks to call ES.

Shay, do you see any way of starting something that we could build on?

Best,
Kaspar

On Jun 27, 11:36 pm, Shay Banon <shay.ba...@elasticsearch.com (http://elasticsearch.com)> wrote:

Heya,

Yea, I had a chat with a mongo fellow in bbuzz, and it seems like tailing the oplog might work. Its simple a lot/delicate work if it needs to be done right, especially when it comes to supporting replica sets and sharding (and resharding...).

On Sunday, June 26, 2011 at 12:30 PM, Chris Berkhout wrote:

Hi Kaspar,

As far as formal integration goes, I was doing a search of the MongoDB
docs and discussion a few weeks ago for what to do about full-text
search, and ES was discussed a candidate, but nothing is officially
recommended or integrated yet.

I've just started building a Rails app that uses MongoDB via Mongoid
and ES via tire. It's working well so far. Updating Mongoid models
will automatically send updates to ES.

I haven't gotten to attachments yet, but I believe you encode the data
in Base64 and put in a JSON request to ES for indexing. I plan to
start with that over the coming weeks.

I guess you're looking for more direct integration, but what
specifically? Something like the CouchDB river might indeed be good.

There are no triggers in Mongo (here is the discussion of including
them in the future:https://jira.mongodb.org/browse/SERVER-124).
However, there is the "oplog", which is a fixed size, rotating log of
operations, and which is used for replication. You can use a tailable
cursor to follow all changes and respond to them, but you should
probably be careful to deal gracefully with the case of your cursor
dying if entries are expired from the log before you've dealt with
them (http://www.mongodb.org/display/DOCS/Tailable+Cursors).

Some people are doing DIY triggers by tailing the oplog:
http://www.snailinaturtleneck.com/blog/2010/10/27/bending-the-oplog-t...
http://groups.google.com/group/mongodb-user/browse_thread/thread/5c18...

For now I'm happy to handle it at the app framework/app level.

Hope that helps!
Chris

On Sun, Jun 26, 2011 at 4:56 PM, hbf <k...@iaeth.ch (mailto:k...@iaeth.ch (http://iaeth.ch))> wrote:

Dear everybody,

I have seen several messages on this forum [1,2] that ask for or
describe approaches to integrate Elastic Search into MongoDB. Has
there been any recent development on this? How hard is it to get it
done?

I am looking for a document store that can save attachments in form of
Office-, PDF-, etc. documents and provides full-text search.

Many thanks for any updates or pointers,
Kaspar

[1]http://elasticsearch-users.115913.n3.nabble.com/ES-with-Mongodb-tp198...
[2]http://elasticsearch-users.115913.n3.nabble.com/ES-and-MongoDB-integr...


(system) #6