I would like to have a river in reverse. Every time a document is inserted
or modified I would like to push that into another destination like a
database. Ideally this would be async or maybe even in batches.
I would like to have a river in reverse. Every time a document is inserted
or modified I would like to push that into another destination like a
database. Ideally this would be async or maybe even in batches.
Note, if you want to connect a database to ES internals, this is generally
a bad idea. The challenge is performance and transaction safety. ES API can
process many hundreds of thousands of actions in a second in unspecified
order, like a "data hose". No database exist that I know of that can keep
up with such a pace. The price would be a slowdown of ES throughput (async
also means a higher cost of CPU for threads and RAM for the message queue
on the same machine which would compete heavily with ES for resources).
From the viewpoint of designing such an architecture, never do a "reverse
river" for "every time a document is inserted or modified". Instead, just
push the doc to ES and to the database concurrently from your middleware
only if necessary, i.e. after applying a filter condition to your doc. This
would also maintain scalability and transaction safety since there is no
guarantee that ES transport layer does not drop (or repeat) transmissions
at any time.
Beside these design questions, there is a new "mock transport" that has
been added to 1.2.0. Nice side effect is that plugin authors should be able
to completely exchange the transport module or wrap the existing one with a
logging facility. I think about a syslog client for revisable logs that
capture certain ES transport actions and write them into an append-only log
on a central log server, so it's up to the syslog server to keep up with
the pace. With a bit filtering it could be possible to capture certain
documents and log them. If this can work out is still totally unclear, I'm
still in the process of designing
I would like to have a river in reverse. Every time a document is inserted
or modified I would like to push that into another destination like a
database. Ideally this would be async or maybe even in batches.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.