Sizing for udp/tcp (syslog type)

Hi.

Does anyone have any sizing knowledge and experience using tcp/udp input plugin for syslog messages?

Sizing knowledge? I think you need to be a bit more specific.

ok.

How many GB per day we can handle per given hardware? What can impact that? Number of CPU's? Number of RegEx's (for example I am going to have dozens of RegEx's that will identify for each message, the vendor and maybe the product e.g Cisco ASA etc)

How do I know if I am starting to have data loss since LS cannot handle a given throughput?

When collecting data from UDP and TCP it often makes sense to have the Logstash instance(s) that do the collection do as little processing as possible in order to optimise throughput and just have it write to a message queue. The introduction of a message queue allows you to buffer data and handle peaks better while at the same time decouple collection from processing. You can therefore have a number of Logstash instances read from the message queue in parallel and do the CPU intensive processing without affecting the collection. This layer can be scaled out without affecting the collection layer.

Thanks Chris for your reply.

Does anyone know - per given hardware - what is the troughput limit in case of TCP/UDP with a minimal processing? How can I monitor a potential data loss?

What is the given hardware?

Lets start with a basic config:
4 CPU
16 GB memory
1 GbE network card.

Are you talking purely about parsing the data, or storing it too?

I am more worried about the input side (Not having data loss cause of a high troughput) and also on the parsing part since I have dozens of regEx for each event.

Monitoring data loss is hard, because how do you know it's gone?

Given you have a bunch of regexes, you really will need to test this yourself.

Hi.

I have seen this slide in the "latest in LogStash" but it does not mentioned the time period. We are probably not talking about EPS,

Does anyone have other benchmarks for the 2.x version?