Hi! I need an advice.
This is a very simplified version of what I have.
I've developed an API that returns some metrics of things that happens on my system. For example, how many time the user is logged, the number of time he logs, during how long he uses a certain functionality, and so on.
My systems sends events to an RabbitMQ and I have a service listening for those events and simply add them to an index. The events are raw and they simply indicates "User A logged in at 3pm", "User A logged out at 4pm", "User A entendered in the monitoring section at 3:15pm".
My API will then perform calculations over those index entries locally. To extract some metrics sometimes we perform searches with aggregations.
The user can request information within a time range (for example, give me the max time that the user was logged in last year) or he simply might want to know if the user is currently logged in.
We are now facing performance issues. There are a lot of entries in the indexes and performing a query is getting expensive both in ElasticSearch and in the API.
The first problem is that I might have 100 users watching a dashboard and this makes 100 requests that end up doing the exact 100 ElasticSearch results. I've Googled a bit but I didn't found a cache mechanism that can see that the request is exactly the same. Do you know of something like that? However, using caches will loose the effect of the realtime information. So, any suggestion to overcome this problem? Is there a way for Elastic to see that an exact query is already being made and don't do it again?
The second problem is that sometimes I have to retrieve all the documents in one or more indexes to perform calculations over it. For example, to extract the logged in time, I have to read all the documents of that user in the specified timerange and sum the times. What is you advice on this? Should I have a kind of "snapshot" with precalculated values? Can I do this kind of calculations during the search?
Thank you in advance for the help.