Hi All,
My documents that I index in ES look conceptually similar to this:
{
"city" : "blah blah",
"country" : "blah blah",
"temperature" : "12.0",
"measureDate": "2015-11-29T11:00:00.000Z"
...
}
Note: There can be multiple documents(measures) per city per day.
Got one new Index per day with an overall amount of ~250.000.000 documents per day.
Now the question that I try to answer is as follows:
For a specific country (terms filter), please give me all cities with the highest temperature difference from one day to another (using the max temperature of the specific city) sorted by the highest difference.
Does anyone have a good idea on how to do such joins across two time slices?
So far I do two terms aggregations on the city (one on each of the queries indices for the two days), spit out the max temperature for each bucket and then calculate the difference on the client side.
Then I sort the result on the client side, generate the document IDs (they are derived by fields I have as a result of the terms aggregation) and query the top 10 documents.
The problem with this is that the initial two terms aggregation take very long (~20 seconds for a large country with a lot of cities).
This is most likely because I can not limit those two terms aggretations. I need all the results (size: 0) in order to properly do the joins on the client side.
Is Elasticsearch maybe not the right choice for this kind of Problem?
Or could I use a scripted aggregation (to let ES calculate the difference) to my advantage?
A derivative pipeline aggregation in theory does exactly what I need but again I would need to run this across ALL city terms (and there may be a lot) because I can not order the buckets by the 'difference' to the other day.
This is not going to be any better in terms of performance.
Hope this is somewhat understandable. Any input or suggestion would be very helpful.
Cheers
Robert