Elasticsearch terms query is very slow?

Elasticsearch cluster

Master node : 1 (16cpus, 128GB RAM) - 32GB is allocated for HEAP
Data nodes  : 6 (each node 16cpus, 128GB RAM) - 32GB is allocated for 
HEAP for all nodes

Indices : 9 
Total documents : 48,826,456
Stored size : 2TB

This query is very slow. Takes around 4seconds

I have profiled it and found terms profile_id in the filter is slow, Since terms array contains around 20k ids. Without terms it takes around 700ms

Any suggestion to improve the performance will be grateful.

That's, uh, a lot of terms. Why do you have so many?

Yes, We have to select only those matching profile_ids document based on our application logic

Disk seeks are expensive.
Many terms = many disk seeks.
You have over 20k query terms.

1 Like

Okay, Any alternate way to fix this

The only way to make an inherently slow task faster is through parallelism.

Use routing to organise all data related to each profile on the same machine. Send searches with routing to target specific machines with queries that only contain profile IDs that are known to be stored on those machines. Send these in parallel from your client.

Without being quite so targeted/optimised you could try break a single search into several smaller bundles of profile IDs and run them in parallel. Each node may be looking for profile IDs it doesn't have but may help squeeze more speed out of each server if it is capable of servicing simultaneous requests well.