I am currently working on a prototype for indexing aggregated syslogng log
messages. My current design features a syslogng river (that I implemented)
that internally runs a syslog4j server that I use to consume and index log
message (via tcp or udp). This is working great so far.
However, I have second thoughts about my choice of using a river. From the
river's prospective, the communication is a push model (syslog daemon or
apps pushing data to ES to be consumed by the river). I therefore do not
really benefit from the great river features (i.e. state management by ES).
If the node running the river dies, I am sure ES will restart the river it
at another node. However, this is not really helping as the entity pushing
messages will still target the old node (tcp connection). Also since the
river is only receiving there is little benefit in tracking the last
Is my assumption correct that a river is primarily useful in scenarios
where ES would be pulling data? If yes, any ideas on how to better solve
the log indexing problem ?