Handle Frequently updated geo data in Elastic Search

I have the location(geo points) data updated every 5 minutes for millions of users. We have to search users with specific attributes(age,interests,languages) & in particular geo range. Wanted to understand the right strategy to store such data in elastic.

Option1

Create user document with following keys

  • user Metadata & attributes (age, interests, languages,salary etc around 8-10 searchable attributes)
  • Live location (changing every few minutes)
"liveLocation" : {
"type" : "Point",
"coordinates" : [-72.333077, 30.856567]
}
  • location data - multiple addresses - home address, work address etc along with geo points. (not updated frequently)
"addresses" :
 [
    {
        "type" : "home",
        "address" : "first floor, xyz, near landmark",
        "city" : "Newyork",
        "country" : "Country", 
        "zipcode" : "US1029", 
        "location" : {
            "type" : "Point",
            "coordinates" : [-73.856077, 40.848447]
    },
    {
     ... more atype of addresses
    }
 ] 

We want to perform geo search queries over all the geo type fields. My worry - live location for users will be updated quite frequently.

Q1. Will this be a viable option considering frequent updates ?

Option2

  • Treat every location update as a time series data and insert a new document. This will avoid updating the documents. instead will insert new documents for each user every few minutes.

Q2. while searching all the users(home/office/live location) in a particular geo polygon, I have to consider only the most recently updated documents for each user. How to do that in elastic ?

Q3. We have to search users with specific attributes(age,interests,language) & in particular geo polygon. If option2 is preferable should user attribute-metadata & location updates be treated as parent-child relationship ?

Q4. Conclusion - What should be the right approach .

Frequent Update is more costly than insert because it needs retrieving the raw document. I think the best approach is making a benchmark based your data and observe the performance metrics when updating frequently.

1 Like

Updating the geo data every 5 minutes should be fine as it is not very frequent.

If you need access to all historic location information as well as fast access to the latest position, an approach I have seen used is to simply combine option 1 and 2. You have a time-based index where you insert each update as a separate document. This index can be used for queries where you analyze movement or positions over time. At the same time you have another index where you have a document per object you are tracking, which you update with every change. This does not contain the history but can efficiently be used to analyze the last know position.

1 Like

Thanks Christian fr your response.
I was just wondering if I start updating location of 10% of my user base every 5 minutes, my elastic server will be busy updating the index and it will impact the search query performance.

Lets say User Base for one state/province ~ 10 Million ,
10% users sending location every 5 minutes - 1 Million location updates every 5 minutes.

Let me know your thoughts.

Indexing new documents is not free either and it results in heavier query load. I suspect you need to test what makes most sense for your use case as the ratio of queries to updates together with latency requirements will drive this. In the end you will need to size the cluster according to the load volume and mix.

1 Like