Because rivers are somehow an external process that run into a node for something else than indexing and searching.
Imagine that you want to run OCR on PDF documents. You know that this is really intensive in term of CPU usage, right?
Does it make sense to have that heavy process running in an elasticsearch node?
It could be better to have that process outside elasticsearch itself.
Rivers are nice when you discover elasticsearch. My personal experience is that you often move from rivers to another process (batch, ETL, logstash…) to have a finer control of this process.
And a river is singleton. It does not scale.
I think that's what Adrien explained.
My 0.02 cents.
--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr | @scrutmydocs
Le 23 sept. 2013 à 22:46, Brian Gadoury bgadoury@endpoint.com a écrit :
I'm not sure why Adrien recommends against using a River. "Synchronizing the content of a database with Elasticsearch" is exactly what rivers do. They are also balanced and recoverable just like ES shards.
To see if a river has processed all the changes in a database, I have a script that does this (for our CouchDB river):
If they match, your river is up to date. If your river's last_seq is lower than your databases update_seq, then your river is not up to date yet.
You can also query that river doc on a loop to determine if your river is doing anything or if it's idle.
-Brian
On Monday, September 23, 2013 9:47:46 AM UTC-6, Adrien Grand wrote:
Hi Didier,
On Mon, Sep 23, 2013 at 5:00 PM, boeledi didier....@gmail.com wrote:
Is there any means for a River to be able to notify the database a soon as fetched records have been processed? This would allow to know that both database and ES are synchronised...
If you are working on synchronizing the content of a database with Elasticsearch, I would recommend not using rivers at all but just writing a simple script that would fetch rows from the database and push them to Elasticsearch. This often ends up being simpler, and in your case it would make it possible to let the databse know that the import is finished without relying on the existence of a specific River API.
--
Adrien Grand
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.