Hi,
Let's say we have multiple servers providing a service. Users can connect to any of the server and get served.
There is a unique ID for a user and the server. We have log when a user 'connects' to a particular server.
Based on this, I want to find out:
a) Users who moved around most. i.e. First connected to server A, then B, maybe back to A (basically anytime it connected to a new server. It could connect to the same server again which is not to be counted here)
b) Users who moved around least. (Ones that didn't hop around much)
Now I am thinking I could read the data, do some processing in an external client application, and ingest additional bits of information for the 'connection' log that would indicate the 'earlier' server and also perhaps whether a change of server occurred.
Would like to know if there is any better way to directly figure this out using some query?
-Thanks
Nikhil




). Since we have diverse teams who will want to analyze the logs differently and in an ever evolving manner, we don't want to do too much processing during ingestion. What we have instead is a client-side application that queries the data, does the analysis and adds the analyzed events back. So it becomes a need-basis operation with the flexibility to do whatever we want inside the python application.