Berkeley DB Java edition river and/or other integration options?

Andrius_Juozapaitis · May 24, 2013, 9:00pm

Hey,

I got fed up with couchdb's lack of transactions, since the compromises in
the design of the application logic to compensate for it became really
annoying. This lead me to investigate a few other non-sql datastores with
ACID transaction support, and well, I was pleasantly surprised. BerkeleyDB
Java Edition pretty much does it all - transactions, high availability,
redundancy, as well as a very neat API for storing POJOs (something along
these lines: https://gist.github.com/andriusj/d94e96c3082495001129). It
even has triggers!

While BDB/JE provides nice way to query according to the PKs and secondary
indexes, I would like to keep all the uber functionality that ES provides.
So I am calling for suggestions - since there are apparently no official
integrations yet, what would be the best approach to take? River interface?
Triggers? Aspects?

regards,
Andrius Juozapaitis

ps. I would also love to hear if anyone had any hands on experience with
BDB - from the few brief tests I did it seems to perform admirably, but may
have missed some hidden flaws that may be there hiding in plain sight.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

brian_yoder · May 24, 2013, 10:02pm

Hi Andrius,

It's a rather long story, but the short version is: Yes, I've had extensive
(10 years) of experience with Berkeley DB (C++ edition), starting when they
were known as Sleepycat. That product enabled me to build a custom search
engine that, well, made Lucene look like a toy. A shiny cool toy, but still
a toy. 20 million records were swallowed with ease. Queries could be an
arbitrarily complex mix of AND and OR with an arbitrarily deep nesting of
expressions, arguments could be geospatial, heavily truncated, one wildcard
in any position (UNARY, U*, Y, or UNY), and could be arranged in any
order (didn't need to juggle queries to put the slow stuff after the fast
stuff; internally it dynamically juggled itself). Query performance was
solely dependent on the number of responses: 1 record was as lightning fast
whether it was one term or several hundred with geospatial circles or
polygons. Hierarchies (parent/child) were built in: My experience is that
this is much better adding in from the start than adding on later (much the
same as ES's approach to adding replication and clustering and failover
from the start instead of bolting it on later). Ranking was added on later,
but it turned out to be very predictable and easy to control; a good
candiate for a late design addition. But all this was in a previous life.

For reasons beyond my control, I was forced to drop BDB and use a poor
substitute that emulated its query and load APIs but not fully, nor its
replication APIs. So in that way, Lucene does about 95% of what I could do
(but usually much more slowly), but Elasticsearch's replication and
ES+Lucene updates are about 10,000% past what I did.

I'm guessing that the Berkeley DB Java Edition could be used to recreate in
Java what I did in C++. But life has moved on, and I find Elasticsearch to
be breathtakingly awesome and Lucene to have enough capabilities and tricks
up its sleeve to be plenty useful with some up-front query design. And it's
much better supported and getting better all the time, which is why I've
settled on it for now and the foreseeable future. I owe a huge debt of
gratitude to the great people on this forum for all their help, patience,
and suggestions!

Brian

On Friday, May 24, 2013 5:00:06 PM UTC-4, Andrius Juozapaitis wrote:

Hey,

I got fed up with couchdb's lack of transactions, since the compromises in
the design of the application logic to compensate for it became really
annoying. This lead me to investigate a few other non-sql datastores with
ACID transaction support, and well, I was pleasantly surprised. BerkeleyDB
Java Edition pretty much does it all - transactions, high availability,
redundancy, as well as a very neat API for storing POJOs (something along
these lines: https://gist.github.com/andriusj/d94e96c3082495001129). It
even has triggers!

While BDB/JE provides nice way to query according to the PKs and secondary
indexes, I would like to keep all the uber functionality that ES provides.
So I am calling for suggestions - since there are apparently no official
integrations yet, what would be the best approach to take? River interface?
Triggers? Aspects?

regards,
Andrius Juozapaitis

ps. I would also love to hear if anyone had any hands on experience with
BDB - from the few brief tests I did it seems to perform admirably, but may
have missed some hidden flaws that may be there hiding in plain sight.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

jprante · May 24, 2013, 10:56pm

I have experience with BDB JE from the very beginning, since 1.x version.

All you say is true and I can understand your enthusiasm about Java having
a reliable key/value database. From DBA point of view, ACID transactions
allow to implement apps with valuable features required for reliable data
processing.

It's not a BDB JE list to discuss all issues in full length, so I try to
briefly sum it up.

BDB JE is not BDB. These are totally different implementations. BDB comes
with Java bindings afaik. It's not compatible to BDB JE. JE is the "younger
sister" of the full-fledged, famous BDB.

The JE advantage is having keys sorted, and iteration over keys and key
prefix lookup performs well as long as you do only read. It's quite easy to
push POJOS into JE. If your values are opaque, you can mess with them as
you like (for output and so on). You must add serialization of values for
yourself, JE comes with some predefined serialization strategies and some
handy API features like cursors that work like queues. And secondary keys
also allow a kind of a join operation between two databases.

The JE disadvantages I can remember are poor performance on mixed workloads
and you can implement it only on a single JVM. JE uses locking, moves all
data over the heap, and therefore it does not scale. It also uses
append-only structures (log-structured trees) and wastes a lot of space on
disk in order to save time. The housekeeper thread tends to be aggressive
and may influence other tasks on the JVM.

For integrating JE with ES, you should consider the nature of your data -
do you want to keep a copy of all the data around? Then just combine your
push service to BDB JE with an ES batch index call. But do you also need
regular deleting? BDB JE key deletion is easy (and slow), but it is a
challenge in ES because ES does not perform deletions in atomic steps
without degradation of performance. So it depends on your requirements if
they allow you to live with stale data in ES for some seconds or minutes.

If your data should move straight one-way from JE to ES, you might consider
a river. Because of the custom serialization in JE it is not easy to
JSONize the data without complex river configuration.

All in all, two systems are twice the work, so both JE and ES adds up in
daily admin routines and development of recovery scripts etc.

Jörg

On Friday, May 24, 2013 11:00:06 PM UTC+2, Andrius Juozapaitis wrote:

Hey,

I got fed up with couchdb's lack of transactions, since the compromises in
the design of the application logic to compensate for it became really
annoying. This lead me to investigate a few other non-sql datastores with
ACID transaction support, and well, I was pleasantly surprised. BerkeleyDB
Java Edition pretty much does it all - transactions, high availability,
redundancy, as well as a very neat API for storing POJOs (something along
these lines: https://gist.github.com/andriusj/d94e96c3082495001129). It
even has triggers!

While BDB/JE provides nice way to query according to the PKs and secondary
indexes, I would like to keep all the uber functionality that ES provides.
So I am calling for suggestions - since there are apparently no official
integrations yet, what would be the best approach to take? River interface?
Triggers? Aspects?

regards,
Andrius Juozapaitis

ps. I would also love to hear if anyone had any hands on experience with
BDB - from the few brief tests I did it seems to perform admirably, but may
have missed some hidden flaws that may be there hiding in plain sight.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

brian_yoder · May 25, 2013, 2:58am

Hi Jörg

Very interesting details. Since I've never used BDB JE it's good to know
your experiences.

One thing that allowed me to use BDB to blow the doors off of anything else
including Lucene was (to my recollection; this is going back 3 years) what
they called "duplicates". In other words, a single key could point to many
different values, and the values were always maintained (created, inserted)
in order. And then you could find the first value by EQ key or GE key (the
first for equality; the second for range). But then the magic started. I
could have a key and a value and search for EQ key and GE value to find the
next value for that key that was greater than or equal to the specified
value.

Now the database built on top of BDB started to sing and dance. I could
have a complex query return the 4 matches in a California database in 8 ms
(on a slow ancient Unix server, and with about 12 M records). But every
record was indexed with state=CA. So when I added state=CA to the query, it
slowed down to only 12 ms. No matter which order it was placed in the AND
operator. This is because the of the ability to search for a key and value
GE than some value.

A key was generally a Unicode collation key with some character mapping. A
value was a record number (ID to ES). So skip logic could be applied and
only a very very tiny fraction of those 8 M state=CA values would need to
be visited. No bitset that had to scan all 8 million records. A magic skip
would rip through it faster than a bitset ever could.

But B-trees don't scale so easily for the massive update performance that
ES+Lucene supports. It would be nice to marry the two concepts together.
sigh

Brian

On Friday, May 24, 2013 6:56:31 PM UTC-4, Jörg Prante wrote:

I have experience with BDB JE from the very beginning, since 1.x version.

All you say is true and I can understand your enthusiasm about Java having
a reliable key/value database. From DBA point of view, ACID transactions
allow to implement apps with valuable features required for reliable data
processing.

It's not a BDB JE list to discuss all issues in full length, so I try to
briefly sum it up.

BDB JE is not BDB. These are totally different implementations. BDB comes
with Java bindings afaik. It's not compatible to BDB JE. JE is the "younger
sister" of the full-fledged, famous BDB.

The JE advantage is having keys sorted, and iteration over keys and key
prefix lookup performs well as long as you do only read. It's quite easy to
push POJOS into JE. If your values are opaque, you can mess with them as
you like (for output and so on). You must add serialization of values for
yourself, JE comes with some predefined serialization strategies and some
handy API features like cursors that work like queues. And secondary keys
also allow a kind of a join operation between two databases.

The JE disadvantages I can remember are poor performance on mixed
workloads and you can implement it only on a single JVM. JE uses locking,
moves all data over the heap, and therefore it does not scale. It also uses
append-only structures (log-structured trees) and wastes a lot of space on
disk in order to save time. The housekeeper thread tends to be aggressive
and may influence other tasks on the JVM.

For integrating JE with ES, you should consider the nature of your data -
do you want to keep a copy of all the data around? Then just combine your
push service to BDB JE with an ES batch index call. But do you also need
regular deleting? BDB JE key deletion is easy (and slow), but it is a
challenge in ES because ES does not perform deletions in atomic steps
without degradation of performance. So it depends on your requirements if
they allow you to live with stale data in ES for some seconds or minutes.

If your data should move straight one-way from JE to ES, you might
consider a river. Because of the custom serialization in JE it is not easy
to JSONize the data without complex river configuration.

All in all, two systems are twice the work, so both JE and ES adds up in
daily admin routines and development of recovery scripts etc.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Andrius_Juozapaitis · May 25, 2013, 7:41am

Thanks for the intel guys, much appreciated!

The framework I am trying to build is an ERP with pluggable modules
(accounting, warehouse, e-commerce), integrated through Spring framework
integration, which allows me to abstract the relations and business
processes, and a SmartGWT frontend. The domain model is reasonably simple,
and I created a model driven architecture, which allows me to see the
domain model changes in the interface right away by using a few simple
annotations to drive the display logic.

Currently, the data persistence layer works by directing all data
modification operations to couchdb (essentially, POJO->Jackson json
lib->EKTorp->CouchDB REST), and the changes in CDB are being picked up by
the river, and posted to ES (using automatically generated indexes and
river configuration through couchdb-river). Most of the time, NRT approach
works just fine, and in some cases, I sync the indexes manually. So
effectively, 95% of the reads are handled by ES (which I can easily scale),
while CDB serves as a persistent data store, potentially providing
replication and failover capabilities. With 90-10% distribution of
read/write requests, this works fine, since I can start any number of ES
instances, and CDB can pretty much handle the C(R)UD operations by itself.

The problems with this approach as I mentioned are mostly related with the
lack of transactions in CDB - it's very hard to make sure multiple updates
don't break data integrity. So essentially, I would like to post the
changes to ES as soon as they are committed to BDB/JE, perhaps even
synchronously. Also, data in ES is regarded as secondary, and can be wiped
out to recreate the indexes from scratch, if necessary.

Regarding the deployment, initially I planned to use a loadbalancer
(apache), a CDB on another dedicated machine, and start pairs of ES+Tomcat
JVMs as workers. Now, I am thinking about using ES+BDB/JE+my application in
a tomcat in the context of a single webapp, and use this unit as a
loadbalancer worker (with session replication). Easier deployment, easier
upgrades. Might even use tomcat7 parallel deployment (
http://java-monitor.com/forum/showthread.php?t=1288).

What do you think?

regards,
Andrius

On Saturday, May 25, 2013 1:56:31 AM UTC+3, Jörg Prante wrote:

I have experience with BDB JE from the very beginning, since 1.x version.

All you say is true and I can understand your enthusiasm about Java having
a reliable key/value database. From DBA point of view, ACID transactions
allow to implement apps with valuable features required for reliable data
processing.

It's not a BDB JE list to discuss all issues in full length, so I try to
briefly sum it up.

BDB JE is not BDB. These are totally different implementations. BDB comes
with Java bindings afaik. It's not compatible to BDB JE. JE is the "younger
sister" of the full-fledged, famous BDB.

The JE advantage is having keys sorted, and iteration over keys and key
prefix lookup performs well as long as you do only read. It's quite easy to
push POJOS into JE. If your values are opaque, you can mess with them as
you like (for output and so on). You must add serialization of values for
yourself, JE comes with some predefined serialization strategies and some
handy API features like cursors that work like queues. And secondary keys
also allow a kind of a join operation between two databases.

The JE disadvantages I can remember are poor performance on mixed
workloads and you can implement it only on a single JVM. JE uses locking,
moves all data over the heap, and therefore it does not scale. It also uses
append-only structures (log-structured trees) and wastes a lot of space on
disk in order to save time. The housekeeper thread tends to be aggressive
and may influence other tasks on the JVM.

For integrating JE with ES, you should consider the nature of your data -
do you want to keep a copy of all the data around? Then just combine your
push service to BDB JE with an ES batch index call. But do you also need
regular deleting? BDB JE key deletion is easy (and slow), but it is a
challenge in ES because ES does not perform deletions in atomic steps
without degradation of performance. So it depends on your requirements if
they allow you to live with stale data in ES for some seconds or minutes.

If your data should move straight one-way from JE to ES, you might
consider a river. Because of the custom serialization in JE it is not easy
to JSONize the data without complex river configuration.

All in all, two systems are twice the work, so both JE and ES adds up in
daily admin routines and development of recovery scripts etc.

Jörg

On Friday, May 24, 2013 11:00:06 PM UTC+2, Andrius Juozapaitis wrote:

Hey,

I got fed up with couchdb's lack of transactions, since the compromises
in the design of the application logic to compensate for it became really
annoying. This lead me to investigate a few other non-sql datastores with
ACID transaction support, and well, I was pleasantly surprised. BerkeleyDB
Java Edition pretty much does it all - transactions, high availability,
redundancy, as well as a very neat API for storing POJOs (something along
these lines: https://gist.github.com/andriusj/d94e96c3082495001129). It
even has triggers!

While BDB/JE provides nice way to query according to the PKs and
secondary indexes, I would like to keep all the uber functionality that ES
provides. So I am calling for suggestions - since there are apparently no
official integrations yet, what would be the best approach to take? River
interface? Triggers? Aspects?

regards,
Andrius Juozapaitis

ps. I would also love to hear if anyone had any hands on experience with
BDB - from the few brief tests I did it seems to perform admirably, but may
have missed some hidden flaws that may be there hiding in plain sight.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

jprante · May 25, 2013, 11:08am

Many years have passed since BDB JE was developed and was one of my most
favorible solution for key/value.

So maybe you should also look at Redis http://redis.io and
GitHub - leeadkins/elasticsearch-redis-river: A Redis River for Elastic Search. for a more up to
date solution.

You will be able to attach multiple language clients to Redis, so it is
true polyglot, similar to ES clients, and that is always good for team
work. Redis has many hot features like sorting, transactions,
pipelining, pubsub, sharding, pooling... plus you get HA by master/slave
and failover which is not avaliable in BDB JE. Note, Redis transactions
do only isolation but no rollback since they use optimistic locking, so
for a classic DBA it might look weird.

With Jedis GitHub - redis/jedis: Redis Java client and Johm
GitHub - xetorthio/johm: JOhm is a Object-hash mapping library for Java for storing objects in Redis you can get very far playing with Java
POJOs and do Hibernate/ORM like stuff.

Jörg

Am 25.05.13 09:41, schrieb Andrius Juozapaitis:

What do you think?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.