Maxing out update queue `remote_transport_exception`

minder1 · August 5, 2019, 7:46pm

I have set up ES to handle a basic Tinder like dating app data stream for profiles and their swipes. My mapping looks something like this:

{
name: text,
dob: date,
...,
swipedOnBy: Array<(profile IDs)>,
liked: Array<(profile IDs)>
}

swiped on by is a list of all profile IDs that have swiped on a given profile and is used to efficiently hide people that you have already swiped on:
...must_not: [{ term: {swipedOnBy: profile.id}}]

liked is a list of all profile IDs that a given profile has liked and is used to boost those profiles in the search results (show you people that have liked you at the top)

Our issue now is that we have to update two profiles for every swipe (like or pass of another profile) but our ES server seems to be maxing out its write queue and starts rejecting update requests (we are seeing remote_transport_exception).

Details:

We have ~400k profiles
swipedOnBy and liked can be as big as 5-7k ids
We currently update the profiles using a painless script so avoid data loss (lets say we pull a profile, append the new id to the swipedOnBy then push but another instance already added another user, this update will lose that ID when it puts this update)
Currently running AWS.DATA.HIGHCPU.M5 with 30GB of RAM @ 7 threads with 200 item capacity
Update script has a retry count of 10 to avoid version issues (open to better ways to handle them as this might be a key part to this issue but when excluded we get a ton of version conflict fails)

Update script looks something like this:

// Updating the likee
if (ctx._source.swipedOnBy != null) {
  if (!ctx._source.swipedOnBy.contains(params.swipedOnBy)) {
      ctx._source.swipedOnBy.add(params.swipedOnBy)
  }
} else {
  ctx._source.swipedOnBy = new int[] {params.swipedOnBy}
}

// Updating the liker
if (ctx._source.liked != null) {
    if (ctx._source.liked.contains(params.liked)) {
         ctx._source.liked.add(params.liked)
    }
} else {
     ctx._source.liked = new int[] {params.liked}
}

Any helpful thoughts are welcome, I am new to ES so if there is a better way to go about this I am open to suggestions. We do have a lot of updates but would imagine a larger ES instance should be able to handle it as currently our Postgres DB handles these requests NP. While just purchasing a larger instance and possibly more nodes is the obvious solution but unfortunately cost is a factor at play here and if that's the only then that is good information to know as well. I should also note we are not maxing out server resources with our current instance (RAM never over 75%, CPU never over 90%, disk storage 3%) so it would suck if we have to double our spend to just handle more requests.

system · September 2, 2019, 7:49pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Getting errors in the ES Java Clinet when upserting the index Elasticsearch	5	3370	July 5, 2017
ElasticSearch Timeout Error during heavy write load Elasticsearch	1	654	August 29, 2019
River throughput (single threaded updates) Elasticsearch	7	396	July 6, 2017
Update by Query on big amount of data Elasticsearch	2	933	January 18, 2018
Concurrent update issue Elasticsearch	6	350	July 6, 2017

Maxing out update queue `remote_transport_exception`

Related topics