ES and MongoDB integration


(Diptamay) #1

Hi All

We are pretty much sure that we are going to move our whole CMS
backend from Oracle+Memcache to ES+MongoDB. We have already started
implementing ES in this regard. The current plan is that the
application is going to send independent updates to ES and MongoDB.
Ideally, I would like this integration to be on the same lines as the
Terrastore-ES integration, like there are event listeners on
Terrastore which send the updates along to ES over memory queues.

I was wondering whether anybody has looked into ES-MongDB integration.
If they have, whats the state of code like? Is it available as open-
source? If not, can we develop something?

Cheers!
Diptamay


(James Cook) #2

We have done a similar thing, but chose Hazelcast -> ES instead of ES ->
MongoDB. It was hard to justify storing the exact same data in ES and in
MongoDB when ES has better querying capabilities and more stable
partitioning/sharding IMHO.

I hope you follow up with your experiences.

-- jim

On Wed, Aug 18, 2010 at 1:00 PM, diptamay diptamay@gmail.com wrote:

Hi All

We are pretty much sure that we are going to move our whole CMS
backend from Oracle+Memcache to ES+MongoDB. We have already started
implementing ES in this regard. The current plan is that the
application is going to send independent updates to ES and MongoDB.
Ideally, I would like this integration to be on the same lines as the
Terrastore-ES integration, like there are event listeners on
Terrastore which send the updates along to ES over memory queues.

I was wondering whether anybody has looked into ES-MongDB integration.
If they have, whats the state of code like? Is it available as open-
source? If not, can we develop something?

Cheers!
Diptamay


(Diptamay) #3

Our intended use of MongoDB is more as a data store (obviously sans
ACID), running on Solaris and ZFS which would replace Oracle rather
than using it as a smart cache. Yes, ES definitely has better querying
abilities and as a matter of fact we plan to use ES not only for
searching but even as a front cache for most of our reads.

Shall follow up with experiences.

-Diptamay

On Aug 18, 1:41 pm, James Cook jc...@tracermedia.com wrote:

We have done a similar thing, but chose Hazelcast -> ES instead of ES ->
MongoDB. It was hard to justify storing the exact same data in ES and in
MongoDB when ES has better querying capabilities and more stable
partitioning/sharding IMHO.

I hope you follow up with your experiences.

-- jim

On Wed, Aug 18, 2010 at 1:00 PM, diptamay dipta...@gmail.com wrote:

Hi All

We are pretty much sure that we are going to move our whole CMS
backend from Oracle+Memcache to ES+MongoDB. We have already started
implementing ES in this regard. The current plan is that the
application is going to send independent updates to ES and MongoDB.
Ideally, I would like this integration to be on the same lines as the
Terrastore-ES integration, like there are event listeners on
Terrastore which send the updates along to ES over memory queues.

I was wondering whether anybody has looked into ES-MongDB integration.
If they have, whats the state of code like? Is it available as open-
source? If not, can we develop something?

Cheers!
Diptamay


(Shay Banon) #4

Based on information I could gather (not investigated it too much
though...), it seems like mongodb don't have something like post
commit/op/action hooks so you can easily mirror changes to elasticsearch.
There was some talk (I think on the mailing list) that the commit log can be
used to try and hack it, but not sure how easy it is or how officially it is
supported in mongodb.

-shay.banon

On Wed, Aug 18, 2010 at 11:14 PM, diptamay diptamay@gmail.com wrote:

Our intended use of MongoDB is more as a data store (obviously sans
ACID), running on Solaris and ZFS which would replace Oracle rather
than using it as a smart cache. Yes, ES definitely has better querying
abilities and as a matter of fact we plan to use ES not only for
searching but even as a front cache for most of our reads.

Shall follow up with experiences.

-Diptamay

On Aug 18, 1:41 pm, James Cook jc...@tracermedia.com wrote:

We have done a similar thing, but chose Hazelcast -> ES instead of ES ->
MongoDB. It was hard to justify storing the exact same data in ES and
in
MongoDB when ES has better querying capabilities and more stable
partitioning/sharding IMHO.

I hope you follow up with your experiences.

-- jim

On Wed, Aug 18, 2010 at 1:00 PM, diptamay dipta...@gmail.com wrote:

Hi All

We are pretty much sure that we are going to move our whole CMS
backend from Oracle+Memcache to ES+MongoDB. We have already started
implementing ES in this regard. The current plan is that the
application is going to send independent updates to ES and MongoDB.
Ideally, I would like this integration to be on the same lines as the
Terrastore-ES integration, like there are event listeners on
Terrastore which send the updates along to ES over memory queues.

I was wondering whether anybody has looked into ES-MongDB integration.
If they have, whats the state of code like? Is it available as open-
source? If not, can we develop something?

Cheers!
Diptamay


(Alberto Paro-2) #5

Flavio Percoco aka Flaper87 is working with the mongodb team on writing a mongodb extension to Use elasticaearch for fulltext.

Look on github for contact him.

Hi, Alberto

Inviato da iPhone

Il giorno 18/ago/2010, alle ore 19:00, diptamay diptamay@gmail.com ha scritto:

Hi All

We are pretty much sure that we are going to move our whole CMS
backend from Oracle+Memcache to ES+MongoDB. We have already started
implementing ES in this regard. The current plan is that the
application is going to send independent updates to ES and MongoDB.
Ideally, I would like this integration to be on the same lines as the
Terrastore-ES integration, like there are event listeners on
Terrastore which send the updates along to ES over memory queues.

I was wondering whether anybody has looked into ES-MongDB integration.
If they have, whats the state of code like? Is it available as open-
source? If not, can we develop something?

Cheers!
Diptamay


(Shay Banon) #6

Thats great!, do you know if it will be opensourced?

On Tue, Aug 24, 2010 at 11:09 AM, Alberto Paro alberto.paro@gmail.comwrote:

Flavio Percoco aka Flaper87 is working with the mongodb team on writing a
mongodb extension to Use elasticaearch for fulltext.

Look on github for contact him.

Hi, Alberto

Inviato da iPhone

Il giorno 18/ago/2010, alle ore 19:00, diptamay diptamay@gmail.com ha
scritto:

Hi All

We are pretty much sure that we are going to move our whole CMS
backend from Oracle+Memcache to ES+MongoDB. We have already started
implementing ES in this regard. The current plan is that the
application is going to send independent updates to ES and MongoDB.
Ideally, I would like this integration to be on the same lines as the
Terrastore-ES integration, like there are event listeners on
Terrastore which send the updates along to ES over memory queues.

I was wondering whether anybody has looked into ES-MongDB integration.
If they have, whats the state of code like? Is it available as open-
source? If not, can we develop something?

Cheers!
Diptamay


(Alberto Paro-2) #7

Il giorno 24/ago/2010, alle ore 13.06, Shay Banon ha scritto:

Thats great!, do you know if it will be opensourced?

Yes, It will be released opensource. Flaper87 is wating some refactoring from MongoDB guys.

Hi,
Alberto Paro


(AGuereca) #8
Il giorno 24/ago/2010, alle ore 13.06, Shay Banon ha scritto:

Thats great!, do you know if it will be opensourced?

Yes, It will be released opensource. Flaper87 is wating some refactoring from MongoDB guys.

Hi,
Alberto Paro

Hi, Alberto.

Are you aware of some updates about this?
I have review the proyects of Flaper87 in Github but I'm unable to find something.

Thanks !


(Alberto Paro-2) #9

Flaper87 was working on it. I can put the plugins in a new my branch of es on github. I'll try to do it this week.
Writing an es river is very easy.

Sent from my iPhone

On 08/apr/2011, at 00:23, AGuereca aguereca@gmail.com wrote:

Alberto Paro-2 wrote:

Il giorno 24/ago/2010, alle ore 13.06, Shay Banon ha scritto:

Thats great!, do you know if it will be opensourced?

Yes, It will be released opensource. Flaper87 is wating some refactoring
from MongoDB guys.

Hi,
Alberto Paro

Hi, Alberto.

Are you aware of some updates about this?
I have review the proyects of Flaper87 in Github but I'm unable to find
something.

Thanks !

--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/ES-and-MongoDB-integration-tp1209794p2792575.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.


(Shay Banon) #10

That would be great!, it would be interesting to see how you solved it!
On Saturday, April 9, 2011 at 11:48 PM, Alberto Paro wrote:

Flaper87 was working on it. I can put the plugins in a new my branch of es on github. I'll try to do it this week.
Writing an es river is very easy.

Sent from my iPhone

On 08/apr/2011, at 00:23, AGuereca aguereca@gmail.com wrote:

Alberto Paro-2 wrote:

Il giorno 24/ago/2010, alle ore 13.06, Shay Banon ha scritto:

Thats great!, do you know if it will be opensourced?

Yes, It will be released opensource. Flaper87 is wating some refactoring
from MongoDB guys.

Hi,
Alberto Paro

Hi, Alberto.

Are you aware of some updates about this?
I have review the proyects of Flaper87 in Github but I'm unable to find
something.

Thanks !

--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/ES-and-MongoDB-integration-tp1209794p2792575.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.


(Enrique Medina Montenegro) #11

Alberto,

Do we have any updates about ES-MongoDB integration?

Thanks.

On Sat, Apr 9, 2011 at 10:48 PM, Alberto Paro alberto.paro@gmail.comwrote:

Flaper87 was working on it. I can put the plugins in a new my branch of es
on github. I'll try to do it this week.
Writing an es river is very easy.

Sent from my iPhone

On 08/apr/2011, at 00:23, AGuereca aguereca@gmail.com wrote:

Alberto Paro-2 wrote:

Il giorno 24/ago/2010, alle ore 13.06, Shay Banon ha scritto:

Thats great!, do you know if it will be opensourced?

Yes, It will be released opensource. Flaper87 is wating some refactoring
from MongoDB guys.

Hi,
Alberto Paro

Hi, Alberto.

Are you aware of some updates about this?
I have review the proyects of Flaper87 in Github but I'm unable to find
something.

Thanks !

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/ES-and-MongoDB-integration-tp1209794p2792575.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.


(Alberto Paro-2) #12

I initial imported the river on my branch on github https://github.com/aparo/elasticsearch
The river was used by Flavio for doing massive import of data.
I need to create some unit tests and docs before doing a pull request to kimchy.

The river is very similar to couchdb one.

You can download it and build with the classical (./gradlew release) or for debugging in Idea (updated the intellij project).

I updated the java mongodb driver to 2.5.3 and the river to using the new address API in mongodb driver.

Hi,
Alberto

Il giorno 15/apr/2011, alle ore 12.50, Enrique Medina Montenegro ha scritto:

Alberto,

Do we have any updates about ES-MongoDB integration?

Thanks.

On Sat, Apr 9, 2011 at 10:48 PM, Alberto Paro alberto.paro@gmail.com wrote:
Flaper87 was working on it. I can put the plugins in a new my branch of es on github. I'll try to do it this week.
Writing an es river is very easy.

Sent from my iPhone

On 08/apr/2011, at 00:23, AGuereca aguereca@gmail.com wrote:

Alberto Paro-2 wrote:

Il giorno 24/ago/2010, alle ore 13.06, Shay Banon ha scritto:

Thats great!, do you know if it will be opensourced?

Yes, It will be released opensource. Flaper87 is wating some refactoring
from MongoDB guys.

Hi,
Alberto Paro

Hi, Alberto.

Are you aware of some updates about this?
I have review the proyects of Flaper87 in Github but I'm unable to find
something.

Thanks !

--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/ES-and-MongoDB-integration-tp1209794p2792575.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.


(Enrique Medina Montenegro) #13

Alberto,

After a quick look at your code I found this line 224:

https://github.com/aparo/elasticsearch/blob/master/plugins/river/mongodb/src/main/java/org/elasticsearch/river/mongodb/MongoDBRiver.java#L224

And I wonder if there's a typo with --> field("lats_id")

On the other side, is there any description on what this river does?

Thanks.

On Fri, Apr 15, 2011 at 4:44 PM, Alberto Paro alberto.paro@gmail.comwrote:

I initial imported the river on my branch on github
https://github.com/aparo/elasticsearch
The river was used by Flavio for doing massive import of data.
I need to create some unit tests and docs before doing a pull request to
kimchy.

The river is very similar to couchdb one.

You can download it and build with the classical (./gradlew release) or for
debugging in Idea (updated the intellij project).

I updated the java mongodb driver to 2.5.3 and the river to using the new
address API in mongodb driver.

Hi,
Alberto

Il giorno 15/apr/2011, alle ore 12.50, Enrique Medina Montenegro ha
scritto:

Alberto,

Do we have any updates about ES-MongoDB integration?

Thanks.

On Sat, Apr 9, 2011 at 10:48 PM, Alberto Paro alberto.paro@gmail.comwrote:

Flaper87 was working on it. I can put the plugins in a new my branch of es
on github. I'll try to do it this week.
Writing an es river is very easy.

Sent from my iPhone

On 08/apr/2011, at 00:23, AGuereca aguereca@gmail.com wrote:

Alberto Paro-2 wrote:

Il giorno 24/ago/2010, alle ore 13.06, Shay Banon ha scritto:

Thats great!, do you know if it will be opensourced?

Yes, It will be released opensource. Flaper87 is wating some
refactoring

from MongoDB guys.

Hi,
Alberto Paro

Hi, Alberto.

Are you aware of some updates about this?
I have review the proyects of Flaper87 in Github but I'm unable to find
something.

Thanks !

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/ES-and-MongoDB-integration-tp1209794p2792575.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.


(Enrique Medina Montenegro) #14

Also here:

https://github.com/aparo/elasticsearch/blob/master/plugins/river/mongodb/src/main/java/org/elasticsearch/river/mongodb/MongoDBRiver.java#L284

On Fri, Apr 15, 2011 at 5:07 PM, Enrique Medina Montenegro <
e.medina.m@gmail.com> wrote:

Alberto,

After a quick look at your code I found this line 224:

https://github.com/aparo/elasticsearch/blob/master/plugins/river/mongodb/src/main/java/org/elasticsearch/river/mongodb/MongoDBRiver.java#L224

And I wonder if there's a typo with --> field("lats_id")

On the other side, is there any description on what this river does?

Thanks.

On Fri, Apr 15, 2011 at 4:44 PM, Alberto Paro alberto.paro@gmail.comwrote:

I initial imported the river on my branch on github
https://github.com/aparo/elasticsearch
The river was used by Flavio for doing massive import of data.
I need to create some unit tests and docs before doing a pull request to
kimchy.

The river is very similar to couchdb one.

You can download it and build with the classical (./gradlew release) or
for debugging in Idea (updated the intellij project).

I updated the java mongodb driver to 2.5.3 and the river to using the new
address API in mongodb driver.

Hi,
Alberto

Il giorno 15/apr/2011, alle ore 12.50, Enrique Medina Montenegro ha
scritto:

Alberto,

Do we have any updates about ES-MongoDB integration?

Thanks.

On Sat, Apr 9, 2011 at 10:48 PM, Alberto Paro alberto.paro@gmail.comwrote:

Flaper87 was working on it. I can put the plugins in a new my branch of
es on github. I'll try to do it this week.
Writing an es river is very easy.

Sent from my iPhone

On 08/apr/2011, at 00:23, AGuereca aguereca@gmail.com wrote:

Alberto Paro-2 wrote:

Il giorno 24/ago/2010, alle ore 13.06, Shay Banon ha scritto:

Thats great!, do you know if it will be opensourced?

Yes, It will be released opensource. Flaper87 is wating some
refactoring

from MongoDB guys.

Hi,
Alberto Paro

Hi, Alberto.

Are you aware of some updates about this?
I have review the proyects of Flaper87 in Github but I'm unable to find
something.

Thanks !

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/ES-and-MongoDB-integration-tp1209794p2792575.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.


(ildella) #15

Hi. I'm also interested in mongo-es integration. I was wondering today I to do that and a trigger system at the mongo level seems the most natural way. There's an open issue, not scheduled yet, here: https://jira.mongodb.org/browse/SERVER-124

I had a look at the code and it seems to me a batch process that wakes up, check for the last mongo id that's been indexed on ES and then index all the new stored object that he can found on the mongo database, based on the last id just retrieved.

Anyone used the plugin? I do not have many experience on ES so I think that to try I should

  1. build it (ok...)
  2. copy the built jar and the mongo java jar under $ES/plugins/river-mongodb
  3. configure something accordingly, probably the elasticsearch.yml with a "mongo" section and all the needed parameter.
  4. nothing more? probably not... I'll try tomorrow :slight_smile:

I'd like to explore the possibility to make this multi-index and multi-collection of course.

Back to a more deep mongo integration, I think that the only good way is some trigger in mongo like the one discussed in that jira issue, what do you think?


(Bryan Green) #16

bump


(ildella) #17

I've made a simple test

            Node node = NodeBuilder.nodeBuilder().node();
	Client client = node.client();
	InputStream mongo =

getClass().getClassLoader().getResourceAsStream("mongodb-river.json");
byte[] input = IOUtils.toByteArray(mongo);
node.client().prepareIndex("_river", "db",
"_meta").setSource(input).execute().actionGet();

and the river starts but I got a:

Exception in thread "elasticsearch[Jessica Jones]mongodb_river_slurper-
pool-20-thread-1" java.lang.IllegalStateException: can't call
authenticate twice on the same DBObject
at com.mongodb.DB.authenticate(DB.java:430)
at org.elasticsearch.river.mongodb.MongoDBRiver
$Tailer.run(MongoDBRiver.java:269)

That is reasonable.

Noe that I've a test in place (*) I can work on the river to make it
work, and also make some refactoring.
I'm on the aparo fork now. I could create mine but I'm not that
familiar with git, what's your opinion?

Also, I'm not familiar with gradle. I created in less then a minute a
basic pom that allows me to work nicely with the mongodb river and
using es 0.17 snapshot as dep and to get all along nicely in Eclipse.
That's all I need to make it work and I'll not have the time to make
it work also with gradle (changing dependencies, dealing with the
horrible Eclipse integration and generally dealing with gradle... ).

On May 26, 11:33 pm, ildella ilde...@gmail.com wrote:

Hi. I'm also interested in mongo-esintegration. I was wondering today I to
do that and a trigger system at the mongo level seems the most natural way.
There's an open issue, not scheduled yet, here:https://jira.mongodb.org/browse/SERVER-124

I had a look at the code and it seems to me a batch process that wakes up,
check for the last mongo id that's been indexed onESand then index all the
new stored object that he can found on the mongo database, based on the last
id just retrieved.

Anyone used the plugin? I do not have many experience onESso I think that
to try I should

  1. build it (ok...)
  2. copy the built jar and the mongo java jar under $ES/plugins/river-mongodb
  3. configure something accordingly, probably the elasticsearch.yml with a
    "mongo" section and all the needed parameter.
  4. nothing more? probably not... I'll try tomorrow :slight_smile:

I'd like to explore the possibility to make this multi-index and
multi-collection of course.

Back to a more deep mongointegration, I think that the only good way is
some trigger in mongo like the one discussed in that jira issue, what do you
think?

--
View this message in context:http://elasticsearch-users.115913.n3.nabble.com/ES-and-MongoDB-integr...
Sent from the ElasticSearch Users mailing list archive at Nabble.com.


(system) #18