I am just starting by adventure with ElasticSearch and I am thinking to use it in my upcoming project, as it seems to fit there.
The ElasticSearch part of the project would be to store and search through hundreds of thousands of small documents, which will contain the following data: Timestamp, LocationId, ObjectId.
The meaning of the data is:
Timestamp - the moment, when the document was added to elasticsearch
LocationId - An ID of a region, for example: Germany, Spain, Austria, Italy etc..
ObjectId - An Id of an Object, which needs to be analyzed - a device, which is doing something or idle
Now, I would have a service, which could creates the above documents in elasticsearch based on messages sent from each ObjectId. So, from time to time, every ObjectId would send some data, when it is doing something (for example every 30 seconds), but it would not not send anything, when it is idle. Also, some of the messages could be lost or delayed (transportation issue)
The problem which I have is to prepare a query, which would be able to give me some information about the length of continuous activity of each device (For example: List all ObjectIDs, which at 20190102221022 (dateformat: yyyymmddhhmmss) were working continuously longer than 30 minutes in the same location, assuming that the maximum allowed gap between timestamps of document created by this ObjectId (in order that the device is still working) is 5 minutes). As an output I would expect a list of ObjectIds, grouped by LocationId, where each objectId would additionally include information about when was a first activity reported and when the last activity reported.
If the ObjectId sends activity data from 2 different locations in the same time, this means it is operating in both locations in the same time.
The idea is that I would be able to use the query either in a realtime (find, which devices are active now and in which location) or to get the information from the past (for example, ask using 20190101... timestamp and get the timewindow of the device activity time even if the operation was finished after the date defined by timestamp (for example from 20190101... to 20190103...)
So, for a query verification process I would search all the documents, which contain random ObjectId and Location Id (out of the result returned), which were created between FirstActivityTime and LastActivityTime and sort them by timestamp ascending. The difference between timestamps must be not greater than 3 minutes. Also I would search for a document, which is just before FirstActivityTime and the difference should be larger than 3 minutes or the document should not exist. And the same thing for LastActivityTime
The non-optimal solution would be to load all data from elasticsearch to an application and do an iteration through each document inside a custom code, but this would mean lot of memory and CPU consumption and a lot of data would have to be moved between elasticsearch and an analytics service quite frequently
So, the question is which approach on elasticsearch would be best to get the desired results. Is it actually possible to solve it on Elasticsearch side?
Thank you for your help in advance!