Keeping datastore and index in sync [Push Vs Pull] - best approach

From what I read, you can have a push model or a pull mdoel for updating
indexes incrementally to keep the datastore and the index in sync.

Bypull, using the river, say JDBC if it is a sql DB or writing your own
river.

By push, implementing an indexer that updates the index using the API calls.

What is the best approach, if you have a complex schema for your index, say
inner object that are complex types themselves.

To further add to the details/requirements,

Push model: if an indexer is implemented that reads from a data source
[with changes or index document data] that is fed by the application at the
time of change or whenever the application intends to update the index. And
the indexer reads from this store to push the changes to the index using
the API. What if there are multiple indexes that needs updated from same
data source or application. What is you want to scale the indexer to
support multiple applciations that could update their indexes using the
same indexer?

Using a pull model, specifically a river? which accesses the DB at the poll
rate. What is the story around reading only incremental updated data after
a last time stamp?

Can the river store the last parameter it used in the river state somewhere
so that the enxt run can use that in the parameter of the query?

Another question, if the index schema has a somplex structure say,

Student {

     Name, 

    YearOfBirth, 

    PhoneNumbers [

          {

            Number,

            Type 

          }      

    ]

}

this is a sample object model stored in index, how will the JDBC river
create jason objects for this array strucure of inner objects? I have seen
where it is samrt enough to figure out the grouping based on a field in the
query and creates simple one field array. Not sure if it works with complex
array types.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hiya

On Thu, 2013-03-28 at 20:25 -0700, string theory wrote:

From what I read, you can have a push model or a pull mdoel for
updating indexes incrementally to keep the datastore and the index in
sync.
Bypull, using the river, say JDBC if it is a sql DB or writing your
own river.

By push, implementing an indexer that updates the index using the API
calls.

My opinion is that rivers are an easy way to get started, but they have
no advantage over an external push process, and are often inflexible and
impose limitations that you could easily handle in an external push
process.

The only real difference between the two approaches is where the code
runs and how much control you have over it.

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.