Filtering while searching

Hi all

Let's say I have documents of the following structure:

Product {
organization;
supplier;
name;
}

and there are billions of documents, a lot of organizations, and each organization has approximately 100000 products . and when searching for products in particular organization by name, It seems to me that it's beneficial to make ElasticSearch first filter out products by organization and then, having only 100000 products out of billions, search by name.

I tried to accomplish this the following way:

        {  
   "query":{  
      "bool":{  
         "must":[  
            {  
               "term":{  
                  "name":{  
                     "value":"test"
                  }
               }
            }
         ],
         "filter":[  
            {  
               "term":{  
                  "organization":"123"
               }
            }
         ]
      }
   }
}

but have got worse results in terms of performance that using this query:

{  
   "query":{  
      "bool":{  
         "must":[  
            {  
               "term":{  
                  "name":{  
                     "value":"test"
                  }
               }
            },
            {  
               "term":{  
                  "organization":{  
                     "value":"123"
                  }
               }
            }
         ]
      }
   }
}

Can anybody explain why? And any suggestions how to achieve my goal?

Filters can be cached because they just match or don't match.
Queries (must) can't be cached because they are used to compute the score. And this is changing every time you are adding documents as you need to recompute everything.

So the very first run with a filter might be a bit slow (less than for a query though) and then super fast.

In short, if a part of the query is not supposed to modify the score, put it in a filter everytime...

If your goal is to always filter by companies, I'd encourage you using routing by organisation so you will also search within one single shard.

What do you mean by modifying score? What I need is having results with score of 1.0 always.

If I understand you correctly the first query in question will 100% work better than the second one, won't it?
Could you please give me a query which will perform better than the two I posted?

What do you mean by routing organization?

The first query is better.
Even better if you don't care at all about score:

{  
   "query":{  
      "bool":{  
         "filter":[  
            {  
               "term":{  
                  "name":{  
                     "value":"test"
                  }
               }
            },
            {  
               "term":{  
                  "organization":"123"
               }
            }
         ]
      }
   }
}

What do you mean by routing organization?

Thanks for a link

As for scoring, like I said, I need score of 1.0. As far as I know, filter can return results even when scoring is less than 1.0. So I bet filtering on name, which can return not 100% matching results, is not the case for me

By the way, can score be different for searching on field that 100% have only one token?

I need score of 1.0

Not sure if we are speaking about the same thing.

GET /_search

Gives a score of 1.0.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.