Design help: What do I gain with Redis?


(thealy) #1

Running Elasticsearch version 0.90.3 on 10 nodes. Two indexes in use with
replication 4, shards 5. Just over 200 million records total so far.
Records are email and http proxy log information, sent to dedicated Java
socket based apps directly from syslog-ng.

I've been using a TCP feed from a central logging machine, to home-grown
apps that parses and combines records before inserting them in ES via the
Java API. I'm about to add several new feeds using the same plan. But I'm
wondering about the role that Redis played in the initial test
installation. Should I be feeding the log stream via Redis? What do I gain?
Buffering? And if I add it into the flow, should it be before or after my
parsing application?

Thanks for any help.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jörg Prante) #2

Not sure about your requirements. And good answers to Redis are rare on an
ES list. From my understanding, Redis is not really a buffer, it is a
persistent queue, and supports a babylonian plentitude of clients. So if
you want a (central) store with transactional data flow, available for post
processing beside ES, with your devs and sysops loving polyglot
architectures, Redis may be the answer.

If you want just collect and index log messages for yourself, much simpler
setups can be imagined. Solutions like logstash or rsyslog are available.
As you are also like to grow your own apps at home, there is nothing to
prevent you from writing a plugin for ES that starts a syslog daemon, parse
the log messages, and index them :wink:

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Mark Walkom) #3

Redis isn't persistent, see
http://en.wikipedia.org/wiki/Redis#Persistencefor details on what it
means in redis terms, but the default behaviour of
redis is not to store anything out of memory.

It's used primarily as an in memory, temporary key/value store that
provides very fast access. Logstash's recommended setup uses redis as a
queuing service between it's various instances, however it used to use an
AMQ system, rabbitmq, which provides actual persistence. I'm not sure why
they swapped though.
If you are taking syslog-ng data and then parsing it into a custom java
app, then Logstash might be a viable alternative as it will save you a
bunch of work.

All that aside, can you clarify what you mean by "I'm wondering about the
role that Redis played in the initial test installation", as that isn't
really explained. Why did you install redis originally?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 30 October 2013 08:03, joergprante@gmail.com joergprante@gmail.comwrote:

Not sure about your requirements. And good answers to Redis are rare on an
ES list. From my understanding, Redis is not really a buffer, it is a
persistent queue, and supports a babylonian plentitude of clients. So if
you want a (central) store with transactional data flow, available for post
processing beside ES, with your devs and sysops loving polyglot
architectures, Redis may be the answer.

If you want just collect and index log messages for yourself, much simpler
setups can be imagined. Solutions like logstash or rsyslog are available.
As you are also like to grow your own apps at home, there is nothing to
prevent you from writing a plugin for ES that starts a syslog daemon, parse
the log messages, and index them :wink:

Jörg

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jörg Prante) #4

To clarify, I'm a Redis user, so Redis is persistent of course, from the
config file: "By default Redis asynchronously dumps the dataset on disk."
You can even choose between different persistence modes if you want trade
performance. And yes, I do kill the server, or on reboot, all data is there
between restarts.

Redis can be configured for higher throughput than RabbitMQ, this might be
the reason to recommend Redis by logstash.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(shadyabhi) #5

Yes, it's better performing than RabbitMQ. Most people in logstash use
Redis as a queue. It helps in building the infra with multiple
producer, multiple consumer model. So, if the task of creating a JSON
is CPU intensive and one machine can't handle, you can start doing it
on 2 machines.

Hope it helps.

On Wed, Oct 30, 2013 at 1:56 PM, joergprante@gmail.com
joergprante@gmail.com wrote:

To clarify, I'm a Redis user, so Redis is persistent of course, from the
config file: "By default Redis asynchronously dumps the dataset on disk."
You can even choose between different persistence modes if you want trade
performance. And yes, I do kill the server, or on reboot, all data is there
between restarts.

Redis can be configured for higher throughput than RabbitMQ, this might be
the reason to recommend Redis by logstash.

Jörg

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Regards,
Abhijeet Rastogi (shadyabhi)

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(thealy) #6

Thank you all for your feedback about using Redis with ES. It seems that
for my application it will be appropriate to add in when I have either
multiple consumers or need to apply multiple servers to parse a given
log stream for insertaion as JSON.

On 10/30/2013 05:25 AM, Abhijeet Rastogi wrote:

Yes, it's better performing than RabbitMQ. Most people in logstash use
Redis as a queue. It helps in building the infra with multiple
producer, multiple consumer model. So, if the task of creating a JSON
is CPU intensive and one machine can't handle, you can start doing it
on 2 machines.

Hope it helps.

On Wed, Oct 30, 2013 at 1:56 PM, joergprante@gmail.com
joergprante@gmail.com wrote:

To clarify, I'm a Redis user, so Redis is persistent of course, from the
config file: "By default Redis asynchronously dumps the dataset on disk."
You can even choose between different persistence modes if you want trade
performance. And yes, I do kill the server, or on reboot, all data is there
between restarts.

Redis can be configured for higher throughput than RabbitMQ, this might be
the reason to recommend Redis by logstash.

Jörg

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #7