Are you using rivers for mission-critical data? Would you?

derrickburns · January 26, 2012, 8:43pm

I am considering using a River as the primary data input mechanism for my
business. The River will pull data AND persist the data to Amazon S3. Do
you know of anyone trusting their mission-critical data to the river?

The alternative is to write my own reader that persists the data and then
pushes it to S3, outside of an elasticsearch node. I am attracted to the
option of putting my code inside a river because of the fail-over mechanism
that the river provides. Are you using this feature in production for
mission critical data?

Berkay_Mollamustafao · January 27, 2012, 4:32pm

Our preference has been not to use the rivers. We find using external
process more flexible and provides more control. For fail-over, we've
implemented our own mechanism, typically designed external processes in
active-active configuration for both load balancing and fault tolerance.

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Thu, Jan 26, 2012 at 3:43 PM, Derrick derrickrburns@gmail.com wrote:

I am considering using a River as the primary data input mechanism for my
business. The River will pull data AND persist the data to Amazon S3. Do
you know of anyone trusting their mission-critical data to the river?

The alternative is to write my own reader that persists the data and then
pushes it to S3, outside of an elasticsearch node. I am attracted to the
option of putting my code inside a river because of the fail-over mechanism
that the river provides. Are you using this feature in production for
mission critical data?

Frederic · January 27, 2012, 5:35pm

Same thing here. After posting some questions here
https://groups.google.com/group/elasticsearch/browse_thread/thread/5ec6aeb8e76a3ad9
we decided not to leverage rivers.

In our case we need to consume a feed of doc changes and we shouldn't
lose any new message, so that we have finally implemented our custom
job for retrying and logging in case of every possible failure.

Of course this ony applies to our specific needs and, at the other
hand, we are indexing docs (about 50/sec) one by one, which is less
efficient and standard than using bulk indexing with rivers I guess.

Cheers,

On 27 ene, 13:32, Berkay Mollamustafaoglu mber...@gmail.com wrote:

Our preference has been not to use the rivers. We find using external
process more flexible and provides more control. For fail-over, we've
implemented our own mechanism, typically designed external processes in
active-active configuration for both load balancing and fault tolerance.

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Thu, Jan 26, 2012 at 3:43 PM, Derrick derrickrbu...@gmail.com wrote:

I am considering using a River as the primary data input mechanism for my
business. The River will pull data AND persist the data to Amazon S3. Do
you know of anyone trusting their mission-critical data to the river?

The alternative is to write my own reader that persists the data and then
pushes it to S3, outside of an elasticsearch node. I am attracted to the
option of putting my code inside a river because of the fail-over mechanism
that the river provides. Are you using this feature in production for
mission critical data?

derrickburns · January 27, 2012, 7:21pm

Thanks Berkay and Frederic for the info. I think that you have identified
a hole in using Rivers for me: what to do with the incoming data should the
process fail to consume it, either writing it to S3 or to the index.

Can you suggest an specific tools for implementing a simple Active-Passive
configuration in Java on AWS?

Berkay_Mollamustafao · January 28, 2012, 3:49pm

Have not used anything AWS specific. One approach is to have the active
instance to send heartbeat to the passive instance as it processes data. If
the passive instance does not receive the heartbeat, it would do another
check to see whether the active is down, and if it is down, it would take
over. It's best to make the decision based on application level checks
rather than checking whether server or process is running, etc.

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Fri, Jan 27, 2012 at 2:21 PM, Derrick derrickrburns@gmail.com wrote:

Thanks Berkay and Frederic for the info. I think that you have identified
a hole in using Rivers for me: what to do with the incoming data should the
process fail to consume it, either writing it to S3 or to the index.

Can you suggest an specific tools for implementing a simple Active-Passive
configuration in Java on AWS?

derrickburns · January 29, 2012, 4:24am

Thanks Berkay. This makes a lot of sense.