I am considering using a River as the primary data input mechanism for my
business. The River will pull data AND persist the data to Amazon S3. Do
you know of anyone trusting their mission-critical data to the river?
The alternative is to write my own reader that persists the data and then
pushes it to S3, outside of an elasticsearch node. I am attracted to the
option of putting my code inside a river because of the fail-over mechanism
that the river provides. Are you using this feature in production for
mission critical data?
Our preference has been not to use the rivers. We find using external
process more flexible and provides more control. For fail-over, we've
implemented our own mechanism, typically designed external processes in
active-active configuration for both load balancing and fault tolerance.
Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype
I am considering using a River as the primary data input mechanism for my
business. The River will pull data AND persist the data to Amazon S3. Do
you know of anyone trusting their mission-critical data to the river?
The alternative is to write my own reader that persists the data and then
pushes it to S3, outside of an elasticsearch node. I am attracted to the
option of putting my code inside a river because of the fail-over mechanism
that the river provides. Are you using this feature in production for
mission critical data?
In our case we need to consume a feed of doc changes and we shouldn't
lose any new message, so that we have finally implemented our custom
job for retrying and logging in case of every possible failure.
Of course this ony applies to our specific needs and, at the other
hand, we are indexing docs (about 50/sec) one by one, which is less
efficient and standard than using bulk indexing with rivers I guess.
Our preference has been not to use the rivers. We find using external
process more flexible and provides more control. For fail-over, we've
implemented our own mechanism, typically designed external processes in
active-active configuration for both load balancing and fault tolerance.
Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype
I am considering using a River as the primary data input mechanism for my
business. The River will pull data AND persist the data to Amazon S3. Do
you know of anyone trusting their mission-critical data to the river?
The alternative is to write my own reader that persists the data and then
pushes it to S3, outside of an elasticsearch node. I am attracted to the
option of putting my code inside a river because of the fail-over mechanism
that the river provides. Are you using this feature in production for
mission critical data?
Thanks Berkay and Frederic for the info. I think that you have identified
a hole in using Rivers for me: what to do with the incoming data should the
process fail to consume it, either writing it to S3 or to the index.
Can you suggest an specific tools for implementing a simple Active-Passive
configuration in Java on AWS?
Have not used anything AWS specific. One approach is to have the active
instance to send heartbeat to the passive instance as it processes data. If
the passive instance does not receive the heartbeat, it would do another
check to see whether the active is down, and if it is down, it would take
over. It's best to make the decision based on application level checks
rather than checking whether server or process is running, etc.
Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype
Thanks Berkay and Frederic for the info. I think that you have identified
a hole in using Rivers for me: what to do with the incoming data should the
process fail to consume it, either writing it to S3 or to the index.
Can you suggest an specific tools for implementing a simple Active-Passive
configuration in Java on AWS?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.