Thanks for the intel guys, much appreciated!
The framework I am trying to build is an ERP with pluggable modules
(accounting, warehouse, e-commerce), integrated through Spring framework
integration, which allows me to abstract the relations and business
processes, and a SmartGWT frontend. The domain model is reasonably simple,
and I created a model driven architecture, which allows me to see the
domain model changes in the interface right away by using a few simple
annotations to drive the display logic.
Currently, the data persistence layer works by directing all data
modification operations to couchdb (essentially, POJO->Jackson json
lib->EKTorp->CouchDB REST), and the changes in CDB are being picked up by
the river, and posted to ES (using automatically generated indexes and
river configuration through couchdb-river). Most of the time, NRT approach
works just fine, and in some cases, I sync the indexes manually. So
effectively, 95% of the reads are handled by ES (which I can easily scale),
while CDB serves as a persistent data store, potentially providing
replication and failover capabilities. With 90-10% distribution of
read/write requests, this works fine, since I can start any number of ES
instances, and CDB can pretty much handle the C(R)UD operations by itself.
The problems with this approach as I mentioned are mostly related with the
lack of transactions in CDB - it's very hard to make sure multiple updates
don't break data integrity. So essentially, I would like to post the
changes to ES as soon as they are committed to BDB/JE, perhaps even
synchronously. Also, data in ES is regarded as secondary, and can be wiped
out to recreate the indexes from scratch, if necessary.
Regarding the deployment, initially I planned to use a loadbalancer
(apache), a CDB on another dedicated machine, and start pairs of ES+Tomcat
JVMs as workers. Now, I am thinking about using ES+BDB/JE+my application in
a tomcat in the context of a single webapp, and use this unit as a
loadbalancer worker (with session replication). Easier deployment, easier
upgrades. Might even use tomcat7 parallel deployment (
http://java-monitor.com/forum/showthread.php?t=1288).
What do you think?
regards,
Andrius
On Saturday, May 25, 2013 1:56:31 AM UTC+3, Jörg Prante wrote:
I have experience with BDB JE from the very beginning, since 1.x version.
All you say is true and I can understand your enthusiasm about Java having
a reliable key/value database. From DBA point of view, ACID transactions
allow to implement apps with valuable features required for reliable data
processing.
It's not a BDB JE list to discuss all issues in full length, so I try to
briefly sum it up.
BDB JE is not BDB. These are totally different implementations. BDB comes
with Java bindings afaik. It's not compatible to BDB JE. JE is the "younger
sister" of the full-fledged, famous BDB.
The JE advantage is having keys sorted, and iteration over keys and key
prefix lookup performs well as long as you do only read. It's quite easy to
push POJOS into JE. If your values are opaque, you can mess with them as
you like (for output and so on). You must add serialization of values for
yourself, JE comes with some predefined serialization strategies and some
handy API features like cursors that work like queues. And secondary keys
also allow a kind of a join operation between two databases.
The JE disadvantages I can remember are poor performance on mixed
workloads and you can implement it only on a single JVM. JE uses locking,
moves all data over the heap, and therefore it does not scale. It also uses
append-only structures (log-structured trees) and wastes a lot of space on
disk in order to save time. The housekeeper thread tends to be aggressive
and may influence other tasks on the JVM.
For integrating JE with ES, you should consider the nature of your data -
do you want to keep a copy of all the data around? Then just combine your
push service to BDB JE with an ES batch index call. But do you also need
regular deleting? BDB JE key deletion is easy (and slow), but it is a
challenge in ES because ES does not perform deletions in atomic steps
without degradation of performance. So it depends on your requirements if
they allow you to live with stale data in ES for some seconds or minutes.
If your data should move straight one-way from JE to ES, you might
consider a river. Because of the custom serialization in JE it is not easy
to JSONize the data without complex river configuration.
All in all, two systems are twice the work, so both JE and ES adds up in
daily admin routines and development of recovery scripts etc.
Jörg
On Friday, May 24, 2013 11:00:06 PM UTC+2, Andrius Juozapaitis wrote:
Hey,
I got fed up with couchdb's lack of transactions, since the compromises
in the design of the application logic to compensate for it became really
annoying. This lead me to investigate a few other non-sql datastores with
ACID transaction support, and well, I was pleasantly surprised. BerkeleyDB
Java Edition pretty much does it all - transactions, high availability,
redundancy, as well as a very neat API for storing POJOs (something along
these lines: https://gist.github.com/andriusj/d94e96c3082495001129). It
even has triggers!
While BDB/JE provides nice way to query according to the PKs and
secondary indexes, I would like to keep all the uber functionality that ES
provides. So I am calling for suggestions - since there are apparently no
official integrations yet, what would be the best approach to take? River
interface? Triggers? Aspects?
regards,
Andrius Juozapaitis
ps. I would also love to hear if anyone had any hands on experience with
BDB - from the few brief tests I did it seems to perform admirably, but may
have missed some hidden flaws that may be there hiding in plain sight.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.