Best practice handling asynchron events

(Tim Pütz) #1

Hey guys,
I am trying to index events that occur at different times in different invervalls.

My situation
There are 2 servers and between them is a middleware. Server A sends a request to do a certain job. Server B is handling this request and executes the job. After the job is done it sends a response. My job is to correlate this events. I do know when Server A sends his request (timestamped),
I know when Server B sends his response (timetamped) and i can correlate them, because they share an id.
I do not know, if Server B is going to send a response (maybe smth failed). I do not know when in future the answer will come ... 100ms or maybe 1 hour (depens on the job / task)...

So, I got this log-file (and later on a stream) and I want to correlate now. My idear would be to

  • the request (at request)
  • the response (at response)
  • the correlatet data (at response)

But for now I only can index the request and the response. I know that I can clone an event (clone the response), but I do not know how to edit this clone or smth like this.

Do you have any idears about what would be a good idear to solve this with logstash?
I thought about a small C++ prog. which sends 2 events at logstahs when a response got send from Server B, but would this be best practice?

Thanks for your help and sorry for my english :stuck_out_tongue:

(Magnus Bäck) #2

Perhaps you can store the requests in a database and use the jdbc_streaming filter to retrieve and merge the request data when the response arrives?

(Tim Pütz) #3

@magnusbaeck Yeah, thats what I was thinking the last 3 hours.

I already created an ERD, implemented it and wrote the needed SELECT statements. At this point it made me realise that I would need to join Request and Response. Also it would cost alot of performance and disk space.

I got 2 more idears, I am sure you can tell me the cons and pros of both.

  1. What about we safe every request in Elasticsearch and when the matching response appears, I add a field "finished_at" with the response timestamp. Would this be performant? Is this possible with the elasticsearch output? Is it possible to create a scripted field "duration" which calculates the duration from request start untill now (when no response appeared) or untill the response timestamp? this scripted field be performant?

  2. We use the aggregation filter to "store" the requests until a response apprears or when a 20 min timelimit got hit. Personally I think this would be the better idear, but I do not know much about the aggregation filter. Would it "store" the requests locally until the 20 min got hit or a matching response appeared, or would it index the response in elastic and update it later (like in 1.). I assume it runs asynchronous, but how many requests could it "store"?

I realy appreciate your help, thanks!

Every idea and explanation would be a gift.

(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.