MongoDb & ES: Where can I store timestamp for last updated document in ES?

bcoder · April 19, 2012, 10:41pm

Let me describe my scenario:

I am using MongoDB as my primary db (we will host everything on AWS,
and our database will be sharded), and want to use Elasticsearch to
provide full-text support. I have various "types" of documents in my
database (which are updated and removed frequently), for example:
"users", "file-metadata", "log", etc, and I want to provide real time
searching support on variety of fields for all types of document
(i.e., as soon ("soon" <= 1sec) as anything in DB changes, ES server
should be updated as well). I am thinking on following strategy:

Run a daemon in background (on probably Mongodb host itself), which
keep doing this operation every second (or even less):

Fetch a timestamp "T" from ES server. (where to keep it in ES, is
what I want to ask from you guys)
Find all mongodb document which are modified after "T", and push
them to ES. (all documents in my case have a modified time field, and
it's indexed).
Update timestamp "T" on ES server to most recent value among all
documents pushed (if none if pushed, then do nothing).
Repeat 1-3 forever.

Note: The above approach do not work for document deletions in Mongodb
database, I was thinking of making a separate collection in Mongodb,
which keep track of deleted objects, so that the daemon can read it
too perpetually and make the required changes in ES server.

The question I want to ask is, where should I store value "T" in ES
server. As you can see from above design, "T" needs to be fetched and
updated very often (possibly every second). Should I keep "timestamp"
as a special document in the same index/type (where I keep my data),
and fetch for it using a special query and update it like a normal
document at the end of each push (will change in value of T cause re-
indexing every time ?). Is there rather some metadata which I can
store with a particular "index" and update/fetch frequently (without
need of indexing it) ? Please note that the value "T" is different
from when ES was last updated (the value "T" is actually - last
modified timestamp of my last mongodb document updated/added in ES).

Can someone see any obvious problem with this solution ? Is there a
better way to do this (river ? - please note, my db is sharded and
also I need to pre-process documents (like add/remove some fields,
etc) before pushing them to ES). All suggestions/comments are welcome.

Thanks!

Radu_Gheorghe1 · April 20, 2012, 6:21am

Hi,

I would use a river:

And I would try to contribute to it, if it wouldn't provide what I
need.

On Apr 20, 1:41 am, bcoder blitzkriegco...@gmail.com wrote:

Let me describe my scenario:

I am using MongoDB as my primary db (we will host everything on AWS,
and our database will be sharded), and want to use Elasticsearch to
provide full-text support. I have various "types" of documents in my
database (which are updated and removed frequently), for example:
"users", "file-metadata", "log", etc, and I want to provide real time
searching support on variety of fields for all types of document
(i.e., as soon ("soon" <= 1sec) as anything in DB changes, ES server
should be updated as well). I am thinking on following strategy:

Run a daemon in background (on probably Mongodb host itself), which
keep doing this operation every second (or even less):

Fetch a timestamp "T" from ES server. (where to keep it in ES, is
what I want to ask from you guys)

Find all mongodb document which are modified after "T", and push
them to ES. (all documents in my case have a modified time field, and
it's indexed).

Update timestamp "T" on ES server to most recent value among all
documents pushed (if none if pushed, then do nothing).

Repeat 1-3 forever.

Note: The above approach do not work for document deletions in Mongodb
database, I was thinking of making a separate collection in Mongodb,
which keep track of deleted objects, so that the daemon can read it
too perpetually and make the required changes in ES server.

The question I want to ask is, where should I store value "T" in ES
server. As you can see from above design, "T" needs to be fetched and
updated very often (possibly every second). Should I keep "timestamp"
as a special document in the same index/type (where I keep my data),
and fetch for it using a special query and update it like a normal
document at the end of each push (will change in value of T cause re-
indexing every time ?). Is there rather some metadata which I can
store with a particular "index" and update/fetch frequently (without
need of indexing it) ? Please note that the value "T" is different
from when ES was last updated (the value "T" is actually - last
modified timestamp of my last mongodb document updated/added in ES).

Can someone see any obvious problem with this solution ? Is there a
better way to do this (river ? - please note, my db is sharded and
also I need to pre-process documents (like add/remove some fields,
etc) before pushing them to ES). All suggestions/comments are welcome.

Thanks!

Topic		Replies	Views
Elasticsearch & mongo - sync Elasticsearch	10	897	July 6, 2017
Timestamps Elasticsearch	4	328	July 6, 2017
ELK with MongoDB Elasticsearch	6	11886	April 21, 2017
Where to store retrieved docs in ES or Mongodb? Elasticsearch	1	299	July 6, 2017
_timestamp Elasticsearch	9	668	July 6, 2017

MongoDb & ES: Where can I store timestamp for last updated document in ES?

Related topics