Elasticsearch & mongo - sync

John_Merrells · August 3, 2010, 7:01pm

Hello,

I have a bunch of documents stored in mongo that I'm indexing with elasticsearch.
Would there happen to be a handy sync script/tool to keep elastic search up to
date wrt a mongo container...? Anyone on the list been down this road before and
care to share some wisdom?

John

kimchy · August 4, 2010, 5:55am

I know that some expressed interest in it. I don't have much
mongo experience (actually, none...), but some suggestions can be either
hook into a "post commit" hook in mongo, or query it in some way for latest
changes? Another option, which is more on the application side, is to apply
the same changes done to mongo to elasticsearch as well. Sure you thought of
these already ...

-shay.banon

On Tue, Aug 3, 2010 at 10:01 PM, John Merrells merrells@gmail.com wrote:

Hello,

I have a bunch of documents stored in mongo that I'm indexing with
elasticsearch.
Would there happen to be a handy sync script/tool to keep Elasticsearch up
to
date wrt a mongo container...? Anyone on the list been down this road
before and
care to share some wisdom?

John

John_Merrells · August 4, 2010, 1:26pm

On Aug 3, 2010, at 10:55 PM, Shay Banon wrote:

I know that some expressed interest in it. I don't have much mongo experience (actually, none...), but some suggestions can be either hook into a "post commit" hook in mongo, or query it in some way for latest changes? Another option, which is more on the application side, is to apply the same changes done to mongo to elasticsearch as well. Sure you thought of these already ...

My curent plan is to have an updated_at datetime field on each document.
With that I can ask Elasticsearch what the max datetime it has and then on
mongo ask for all the documents with datetime greater than that. The only
twisty bit so far is that the datetime is going to need to be stored as a string
on mongo, and a long on elasticsearch, but in theory it should work.

John

kimchy · August 4, 2010, 1:31pm

Sounds good. There is no long type in mongo?

On Wed, Aug 4, 2010 at 4:26 PM, John Merrells merrells@gmail.com wrote:

On Aug 3, 2010, at 10:55 PM, Shay Banon wrote:

I know that some expressed interest in it. I don't have much mongo
experience (actually, none...), but some suggestions can be either hook into
a "post commit" hook in mongo, or query it in some way for latest changes?
Another option, which is more on the application side, is to apply the same
changes done to mongo to elasticsearch as well. Sure you thought of these
already ...

My curent plan is to have an updated_at datetime field on each document.
With that I can ask Elasticsearch what the max datetime it has and then on
mongo ask for all the documents with datetime greater than that. The only
twisty bit so far is that the datetime is going to need to be stored as a
string
on mongo, and a long on elasticsearch, but in theory it should work.

John

John_Merrells · August 4, 2010, 1:38pm

On Aug 4, 2010, at 6:31 AM, Shay Banon wrote:

Sounds good. There is no long type in mongo?

Doh. Yes, it does. That'd be simpler.

I got there because I started looking at the Date type, but on mongo
its hard to query it through the protocol, and on elasticsearch there's
no max for date... so then I tried string.... then long.... so yeah I should
just use long on both....

John

John_Merrells · August 4, 2010, 1:48pm

On Aug 4, 2010, at 6:31 AM, Shay Banon wrote:

Sounds good. There is no long type in mongo?

Sorry to bore everyone with my early morning pre-coffee mumblings,
but 'float' on both mongo and elastic is most probably the way to go.

John

kimchy · August 4, 2010, 1:49pm

Why float? I assumed long and represent it as a timestamp. Date type in
elasticsearch is just a facade on top of long.

-shay.bano

On Wed, Aug 4, 2010 at 4:48 PM, John Merrells merrells@gmail.com wrote:

On Aug 4, 2010, at 6:31 AM, Shay Banon wrote:

Sounds good. There is no long type in mongo?

Sorry to bore everyone with my early morning pre-coffee mumblings,
but 'float' on both mongo and elastic is most probably the way to go.

John

John_Merrells · August 4, 2010, 1:59pm

On Aug 4, 2010, at 6:49 AM, Shay Banon wrote:

Why float? I assumed long and represent it as a timestamp. Date type in elasticsearch is just a facade on top of long.

There could be many updates in a second... In Ruby Time.now.to_f returns a Float....

I've just realized that mongo can auto generate ids, which are a bit like guids, so have a
seconds and a counter field within them.... soo... I could extract those bits and use them,
but it'd amount to much the same thing, and be a bit opaque.

Still pre coffee.... still rambling....

John

kimchy · August 4, 2010, 2:03pm

Auto increment GUIDs are the best, but with timestamps, as you suggested,
there might be severa within the same resolution. One way to work around
them is the create the query with where you subtract the resolution you get
on your machine (1 milli for example), use "index" on whatever falls within
it, and use create on the rest (assuming you know that they don't exists in
elasticsearch).

By the way, the most difficult part when it comes to sync's is to handle
deletes, I assume you don't have them?

-shay.banon

On Wed, Aug 4, 2010 at 4:59 PM, John Merrells merrells@gmail.com wrote:

On Aug 4, 2010, at 6:49 AM, Shay Banon wrote:

Why float? I assumed long and represent it as a timestamp. Date type in
elasticsearch is just a facade on top of long.

There could be many updates in a second... In Ruby Time.now.to_f returns a
Float....

I've just realized that mongo can auto generate ids, which are a bit like
guids, so have a
seconds and a counter field within them.... soo... I could extract those
bits and use them,
but it'd amount to much the same thing, and be a bit opaque.

Still pre coffee.... still rambling....

John

John_Merrells · August 4, 2010, 2:30pm

On Aug 4, 2010, at 7:03 AM, Shay Banon wrote:

By the way, the most difficult part when it comes to sync's is to handle deletes, I assume you don't have them?

No deletes.

John

Topic		Replies	Views
Sync up changes from mongodb to elasticsearch Logstash	4	2803	December 15, 2016
ElasticSearch - MongoDB sync Elasticsearch	2	1422	January 28, 2018
Auto-sync mongodb documents with elasticsearch Elasticsearch	3	2876	December 12, 2016
Elasticsearch and Mongo DB real time sync Elasticsearch	6	5770	July 5, 2017
ES with Mongodb Elasticsearch	5	298	July 6, 2017

Elasticsearch & mongo - sync

Related topics