Request log from python to build stats over user usage

Hi all,

We have built a system that has to keep track of the number of calls a user does against our API, we also have to keep track of the logging info to build "audit" logs and give to the user the possibility to search and retrieve data.
So far, we have used graylog as a server where to stream the data, via udp.
Since I want to migrate to ELK & co (apm as well) what's the best/easiest solution to do?

We have a huge amount of data every second to stream to logstash and have them saved in elastic (we stream JSON). Should I create a UDP connection as well (does exist a library)? is there anything provided by logstash? I've check APM but it's mostly to evaluate performances and errors rather than collecting logs.
Our setup is in kubernetes and docker containers where the API codes runs. So streaming files may not be easier, while streaming directly from the app could be a bit better.

PS: is logstash able to parse the data JSON and maybe route them to different indexes? Since the data stored will be "huge" I would like to create indexed by users (this would be cool to export data, but almos impossible to remove data after X days) or divided in size of the docs or time (this makes easier to remove indexes after X days, right?). is this feasible?

Sure it is possible.

Logstash has a gelf input that can use tcp or udp and a JSON codec to decode the data.

Some people find that one Logstash instance is not enough to cope with a very high load although this is partly determined by the performance of the elasticsearch cluster.

There are various designs for parallelisation. The elasticsearch output can use values from a document to put that doc in an index.

Right now I'm using a https://github.com/vklochan/python-logstash that sends JSON over udp. what would be the benefit of using GELF input?

is there any known limit of performances? let's say X messages/seconds or sort of it, just to understand the magnitude.

That python library looks good. You will need a TCP input and a json filter. The json codec would need a newline as the final character in the tcp payload and I'm not sure the newline will be there.

Ignore what I said about the gelf input but the rest of of my comment is valid.

it actually works, i need to parse inner messages but that's a logstash pipeline configuration. i'm using UDP rather than TCP for performances ..

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.