How to exclude users from search properly

Hi all,

I need to exclude from search users that already interacted with each other. I solved it in the next way:

Just added an exclude_ids field to the user index. For example, let's say that when the user with id 1 is searching, we use a Term Query to check that id 1 is not inside exclude_ids of target users, a query like:

{
  ...
    "must_not": { 
       "term": { "exclude_ids": 1 }
    }
 ...
}

After using it I found out that:

Search is fast

  • Taking into account how the inverted index works I think this usage is correct
  • Search is done inside the same index (having to search in other indexes means checking more shards)

Updates are slow

  • Each time an id is added to exclude_ids the whole document is reindexed, since partial updates to a document are not possible in ES. If the exclude_ids array gets very long, the updates can become specially slow, becoming a bottleneck in the system.
  • For the same reason, indexed data that is not usually updated is reindexed, like name or age.

Definitely this does not scale good. I thought of some solutions but I can not find an optimal one. I came up with these ideas:

  • Using Terms Filter Lookup could be an option, but I wonder about the performance with many terms (around 10000-50000 depending on the case).

  • Not storing exclude_ids in Elasticsearch, but in another place; sending all the ids in a Terms or Ids query, but I'm worried about how that would impact the performance of the search itself.

  • Using parent/child could be other way, but this would mean indexing many documents per user, plus the structure of the parent mapping and child mapping would be very different (a lot of disperse data).

Any ideas?
Thank you very much.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.