Terms query takes lots of cpu usage

Tamizh · May 23, 2017, 6:40am

I am using terms query to find the followers of a user.
For that, I fetch the follower ids of that user and use terms query to get the user result from elasticsearch. But when I run the query it took lots of CPU. I guess it because of more terms per query.
But I couldn't find any other solution for this. Any suggestions will be helpful.

Mark_Harwood · May 23, 2017, 12:29pm

If you denormalize the data such that each user has a "follows" array you can just do this:

 GET /users/_search
 {
     "query": {
          "match" : { 
              "follows" : "userX"
         }
    }
 }

Tamizh · May 23, 2017, 2:58pm

Really thanks for the response. I thought the same way too. But I have 70 million users. So it will take too much time to add follows array. Is there any other option to overcome this issue ?.

Mark_Harwood · May 23, 2017, 4:03pm

It comes down to physics. Random disk seeks for lots of unique IDs are slow. SSDs will help but there's still a cost with big numbers.
Using a graph database will remove the index lookups at query time by chasing around connections using pointers but:

This means using a single-server solution with lots of RAM
All your index-lookups are shifted to write-time when the database has to convert user IDs to pointers.

No easy answers here.

system · June 20, 2017, 4:04pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.