Retrieve geolocation sampled data

MarcoAbi · July 19, 2019, 5:15pm

Hello,
I have an index that collects 1 geospatial point data every second per each device (eg, a moving car).
Now, I need to collect the geopoint in order to plot the path that the given vehicle has made. However, plotting a path of 2 hours long with a such frequency ends up with 3600*2 datapoints, that is definitely too much for this use case (however, I do need such resolution for other use cases).
Is there a way to query the index and retrieve only a sample of those 7200 points?

Thanks a lot!
Marco

Mark_Harwood · July 19, 2019, 5:37pm

We have a sampler aggregation which can take the top N hits and feed them to a contained child aggregation.
Additionally there's a diversified_sampler which may be useful to ensure the selection of docs is not focused in any one particular time range or location.

MarcoAbi · July 19, 2019, 9:02pm

Wonderful!

Thanks a lot Mark.

Elastic is a masterpiece!

MarcoAbi · July 22, 2019, 2:23pm

Hello Mark,

I'm experiencing problems with the diversified_sampler query.
This is the query I'm setting up:

GET gps_data/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "term" : {
            "deviceId": 123
          }
        },
        {
          "range": {
            "start": {
              "gte": 1562047171000,
              "lte": 1562047514000,
              "format": "epoch_millis"
            }
          }
        }
      ],
      "filter": [],
      "should": [],
      "must_not": []
    }
  },
  "aggs": {
        "my_unbiased_sample": {
            "diversified_sampler": {
                "shard_size": 200,
                "field" : "eventTime"
            },
            "aggs": {
                "keywords": {
                    "significant_terms": {
                        "field": "location"
                    }
                }
            }
        }
    }
}

Elastic replies back that "significant_terms aggregation cannot be applied to field [location]. It can only be applied to numeric or string fields." I'm wondering if I'm setting up the query correctly or actually there is this limitation in the query.

Thanks!
Marco

Mark_Harwood · July 22, 2019, 3:12pm

Wasn't the goal to put your geo aggregation under the sampler?
You may need to have a higher-granularity field for diversification too - if the accuracy is millisecond level you'll only be limiting the number of docs considered per millisecond. You might need to use a script to "round up" the times to hours or minutes or whatever suits as your unit for de-duplicating

MarcoAbi · July 23, 2019, 6:48am

Ok, got your point.

I have to refactor the index a bit than, I want to avoid scripted fields to maintain performances high enough.

Meanwhile I’m using the functional query with a random function. This is creating the uniform scoring that allows me to drop out enough data and have a rough good quality samples.

Thanks again for your support.

Mark_Harwood · July 23, 2019, 11:20am

Sounds like a good approach.

system · August 20, 2019, 11:20am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Hard geo query Elasticsearch	1	318	September 3, 2018
Retrieving only matching locations Elasticsearch	5	952	July 5, 2017
Aggregation/Clustering of Geo Points Elasticsearch	2	591	July 5, 2017
Geo Query Performance Elasticsearch	4	1949	December 29, 2017
Querying 2D-geometry data Elasticsearch	5	682	March 6, 2017

Retrieve geolocation sampled data

Related topics