Recommended ELK + PHP "topology" & how to set variable type from within PHP?

RVN_BR · May 18, 2015, 3:45pm

Hi, I have a two-fold question. I will try to keep it as simple as possible. I have searched a lot and found examples in both ways, but still havent found any pros/cons to be decisive?

Objective:

Multi-server system, with multiple physical and virtual setups.
Centrailize metrics and logs from several applications and system applications into ELK stack.
Log app events and metrics from within our own application. This is mostly php-based. We are using Monolog logging class.

Questions:
1 - Should we use a centralized Redis instance (with high availability/failover) as a log collector? (PHP>Redis>Logstash>ES) Would other apps such as system apps also feed into Redis?
1b - should I ditch redis in favor of a local logstash-forwarder on each host to hold the logs and fwd to another central host?
Basically App writes to Redis, Logstash Reads from Redis, writes to ES, or App wirtes to Logstash. Logstash-fwd writes to Logstash (centralized)> Logstash (centralized) writes to ES... What is the most common architecture? We have Redis experience inhouse, and use it a lot, but I dont want to add moving parts unless its a good idea. I was told the Redis layer is a way of protecting against dataloss, but if I have logstash-fwd on each instance, I understand logstahs-fwd will also hold the logs until the logstash(centralized/aggregator) is back up?

and

2 - what is the correct way of setting the type for the entry from within the PHP app? I tried modifying the logstash input to something like type => [fields][ctxt_event] or type => [ctxt_event] the former fails on startup, the latter gives me a fixed type with the string 'ctxt_event'... When I ommit this i'm getting all types='logs' which I'm not setting anywhere... (at least not advertently). I guess this may also be partly a monolog implementation issue but maybe not?

Thanks for all help. If you have just articles or refrences from ppl who have gone down this line that would be a great response too... I've just not really found a lot of info, and a lot of it is quite old in what I find...

warkolm · May 18, 2015, 9:08pm

Up to you, I wouldn't as you are then locked into that method unless you make a code change. Outputting to TCP or even file gives you better flexibility.
Another one that's up to you. Having a broker of some sort in the pipeline is handy though.
The type of logs is default to LS and you can only ever set one _type value in ES, it is not an array. I don't know how to do that in PHP, but if you want multiple values then you probably want to use tags, not types.

RVN_BR · May 18, 2015, 9:29pm

When I meant multiple types it wasnt multiple per log entry, it was one per log entry, but as a way of filtering data... I may keep all PHP-generated data in one type, hoewver, or make 1 large per-app type instead, and do further filtering on some other field such as "eventcode" or something. Are there things we can do based on _type that one cannot do based on other fields?

My thinking is using Logstash on each server, outputting to a central redis node (or multiple nodes/cluster), and then one or more logstash indexers reading inputs from redis.

So the log "collectors" would all be inputting from multiple sources (such as syslogs, other files, etc) and outputting into Redis. The log Indexers would be Inputting from Redis, doing whatever filtering may be suitable (should filtering be going on in the collectors instead? my thinking here is to relieve heavy lifting from my "web nodes"), and Outputting to the ES cluster.

In the above scenario, The PHP app can output direct to Redis (using the monolog library+logstash handler) or to the local host through udp or tcp? not sure what would be best from a performance point of view? If all filtering is going on in the indexer nodes, going from PHP straight to the Redis "broker" may be the best? I have found very few references online abt ppl using PHP+Redis+ELK... The examples I found are writing straight to Redis as far as I am able to tell... but I suppose I could define the localhost as a logstash and write to that instead, and have the localhost forward those too... I'm unsure if its "one more stop" (php-to-logstashfwd + logstashfwd-to-redis vs php-to-redis) or it would be much quicker to just add that local writing...

Thanks for your input once again.

warkolm · May 20, 2015, 8:10am

You're better off filtering after putting into redis, that way you keep the collection part of the chain super simple and fast, and you can scale the indexing/filtering part very easily.

How you get to redis is, again, up to you. But testing a few things would definitely be a good idea.