Using River for indexing log messages

I am currently working on a prototype for indexing aggregated syslogng log
messages. My current design features a syslogng river (that I implemented)
that internally runs a syslog4j server that I use to consume and index log
message (via tcp or udp). This is working great so far.

However, I have second thoughts about my choice of using a river. From the
river's prospective, the communication is a push model (syslog daemon or
apps pushing data to ES to be consumed by the river). I therefore do not
really benefit from the great river features (i.e. state management by ES).
If the node running the river dies, I am sure ES will restart the river it
at another node. However, this is not really helping as the entity pushing
messages will still target the old node (tcp connection). Also since the
river is only receiving there is little benefit in tracking the last
message seen.

Is my assumption correct that a river is primarily useful in scenarios
where ES would be pulling data? If yes, any ideas on how to better solve
the log indexing problem ?

Have you looked at logstash?

Berkay

On Thursday, January 5, 2012, Jan Fiedler fiedler.jan@gmail.com wrote:

I am currently working on a prototype for indexing aggregated syslogng
log messages. My current design features a syslogng river (that I
implemented) that internally runs a syslog4j server that I use to consume
and index log message (via tcp or udp). This is working great so far.
However, I have second thoughts about my choice of using a river. From
the river's prospective, the communication is a push model (syslog daemon
or apps pushing data to ES to be consumed by the river). I therefore do not
really benefit from the great river features (i.e. state management by ES).
If the node running the river dies, I am sure ES will restart the river it
at another node. However, this is not really helping as the entity pushing
messages will still target the old node (tcp connection). Also since the
river is only receiving there is little benefit in tracking the last
message seen.
Is my assumption correct that a river is primarily useful in scenarios
where ES would be pulling data? If yes, any ideas on how to better solve
the log indexing problem ?

--
Regards,
Berkay Mollamustafaoglu
Ph: +1 (571) 766-6292
mberkay on yahoo, google and skype